
Senior ML/Data Ops Engineer II
Tabby
full-time
Posted on:
Location Type: Remote
Location: Serbia
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- LLM Serving & Model Management:
- Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
- Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
- Advanced optimization and security hardening of Docker specifically for GPU environments.
- Managing model weights and orchestration within Kubernetes (GKE) environments.
- Real-Time Data Engineering & CDC:
- Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
- Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
- Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
- Core Infrastructure & Networking:
- Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
- Experience with Istio service mesh to manage microservices communication and traffic.
- Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
- Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
- CI/CD & Tooling:
- Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
- Infrastructure as Code with Terraform and Terragrunt.
- Proficiency in Python/Bash for building custom automation and AI Agent tooling.
- Load Testing & Observability:
- Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
- Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
- Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
- Soft Skills:
- Strong ownership mindset: balancing speed, reliability, and cost.
- Comfortable working cross-functionally with developers, security, and compliance.
- Excellent sense of responsibility and accountability.
- English B2 or higher.
- Nice to Have:
- Experience with PCI-DSS, SOC2, or regulations compliance environments.
- Our Tech Stack: Linux, Docker, Kubernetes, GCP (GKE, Cloud PostgreSQL), Datadog, GitLab, Apache CDC, ClickHouse, Airflow, Istio, Terraform, Terragrunt, Ansible, vLLM, TensorRT-LLM, sglang, LiteLLM, DeepSeek, Qwen, Go, Python
Requirements
- Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
- Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
- Advanced optimization and security hardening of Docker specifically for GPU environments.
- Managing model weights and orchestration within Kubernetes (GKE) environments.
- Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
- Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
- Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
- Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
- Experience with Istio service mesh to manage microservices communication and traffic.
- Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
- Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
- Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
- Infrastructure as Code with Terraform and Terragrunt.
- Proficiency in Python/Bash for building custom automation and AI Agent tooling.
- Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
- Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
- Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
- Strong ownership mindset: balancing speed, reliability, and cost.
- Comfortable working cross-functionally with developers, security, and compliance.
- Excellent sense of responsibility and accountability.
- English B2 or higher.
- Experience with PCI-DSS, SOC2, or regulations compliance environments.
Benefits
- - Full-time B2B contract
- - Fully remote setup, work from anywhere in Europe
- - Up to 20% tax allowance
- - 22 paid leave days annually
- - Stock options (ESOP) in a fast-scaling, pre-IPO company
- - Flexi benefits you can use for wellness, travel, or learning
- - Work alongside a high-performing, international engineering team in a global fintech unicorn
- Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
high-throughput servingvLLMNVIDIA TensorRT-LLMsglangDockerKubernetesCDCApacheClickHousePython
Soft Skills
ownership mindsetcross-functional collaborationresponsibilityaccountabilitycommunication