Senior ML/Data Ops Engineer II

Tabby

full-time

Posted on: 1/29/2026

Location Type: Remote

Location: Serbia

Visit company website

Explore more

Operations jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

Airflow Ansible Apache Cloud Distributed Systems Docker Go Google Cloud Platform Kafka Kubernetes Linux Microservices Postgres Python Terraform

About the role

LLM Serving & Model Management:
Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
Advanced optimization and security hardening of Docker specifically for GPU environments.
Managing model weights and orchestration within Kubernetes (GKE) environments.
Real-Time Data Engineering & CDC:
Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
Core Infrastructure & Networking:
Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
Experience with Istio service mesh to manage microservices communication and traffic.
Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
CI/CD & Tooling:
Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
Infrastructure as Code with Terraform and Terragrunt.
Proficiency in Python/Bash for building custom automation and AI Agent tooling.
Load Testing & Observability:
Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
Soft Skills:
Strong ownership mindset: balancing speed, reliability, and cost.
Comfortable working cross-functionally with developers, security, and compliance.
Excellent sense of responsibility and accountability.
English B2 or higher.
Nice to Have:
Experience with PCI-DSS, SOC2, or regulations compliance environments.
Our Tech Stack: Linux, Docker, Kubernetes, GCP (GKE, Cloud PostgreSQL), Datadog, GitLab, Apache CDC, ClickHouse, Airflow, Istio, Terraform, Terragrunt, Ansible, vLLM, TensorRT-LLM, sglang, LiteLLM, DeepSeek, Qwen, Go, Python

Requirements

Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
Advanced optimization and security hardening of Docker specifically for GPU environments.
Managing model weights and orchestration within Kubernetes (GKE) environments.
Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
Experience with Istio service mesh to manage microservices communication and traffic.
Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
Infrastructure as Code with Terraform and Terragrunt.
Proficiency in Python/Bash for building custom automation and AI Agent tooling.
Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
Strong ownership mindset: balancing speed, reliability, and cost.
Comfortable working cross-functionally with developers, security, and compliance.
Excellent sense of responsibility and accountability.
English B2 or higher.
Experience with PCI-DSS, SOC2, or regulations compliance environments.

Benefits

- Full-time B2B contract
- Fully remote setup, work from anywhere in Europe
- Up to 20% tax allowance
- 22 paid leave days annually
- Stock options (ESOP) in a fast-scaling, pre-IPO company
- Flexi benefits you can use for wellness, travel, or learning
- Work alongside a high-performing, international engineering team in a global fintech unicorn
Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

high-throughput servingvLLMNVIDIA TensorRT-LLMsglangDockerKubernetesCDCApacheClickHousePython

Soft Skills

ownership mindsetcross-functional collaborationresponsibilityaccountabilitycommunication