Tabby

Senior ML/Data Ops Engineer II

Tabby

full-time

Posted on:

Location Type: Remote

Location: Serbia

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • LLM Serving & Model Management:
  • Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
  • Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
  • Advanced optimization and security hardening of Docker specifically for GPU environments.
  • Managing model weights and orchestration within Kubernetes (GKE) environments.
  • Real-Time Data Engineering & CDC:
  • Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
  • Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
  • Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
  • Core Infrastructure & Networking:
  • Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
  • Experience with Istio service mesh to manage microservices communication and traffic.
  • Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
  • Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
  • CI/CD & Tooling:
  • Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
  • Infrastructure as Code with Terraform and Terragrunt.
  • Proficiency in Python/Bash for building custom automation and AI Agent tooling.
  • Load Testing & Observability:
  • Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
  • Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
  • Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
  • Soft Skills:
  • Strong ownership mindset: balancing speed, reliability, and cost.
  • Comfortable working cross-functionally with developers, security, and compliance.
  • Excellent sense of responsibility and accountability.
  • English B2 or higher.
  • Nice to Have:
  • Experience with PCI-DSS, SOC2, or regulations compliance environments.
  • Our Tech Stack: Linux, Docker, Kubernetes, GCP (GKE, Cloud PostgreSQL), Datadog, GitLab, Apache CDC, ClickHouse, Airflow, Istio, Terraform, Terragrunt, Ansible, vLLM, TensorRT-LLM, sglang, LiteLLM, DeepSeek, Qwen, Go, Python

Requirements

  • Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
  • Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
  • Advanced optimization and security hardening of Docker specifically for GPU environments.
  • Managing model weights and orchestration within Kubernetes (GKE) environments.
  • Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
  • Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
  • Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability.
  • Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
  • Experience with Istio service mesh to manage microservices communication and traffic.
  • Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
  • Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
  • Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
  • Infrastructure as Code with Terraform and Terragrunt.
  • Proficiency in Python/Bash for building custom automation and AI Agent tooling.
  • Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
  • Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
  • Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.
  • Strong ownership mindset: balancing speed, reliability, and cost.
  • Comfortable working cross-functionally with developers, security, and compliance.
  • Excellent sense of responsibility and accountability.
  • English B2 or higher.
  • Experience with PCI-DSS, SOC2, or regulations compliance environments.
Benefits
  • - Full-time B2B contract
  • - Fully remote setup, work from anywhere in Europe
  • - Up to 20% tax allowance
  • - 22 paid leave days annually
  • - Stock options (ESOP) in a fast-scaling, pre-IPO company
  • - Flexi benefits you can use for wellness, travel, or learning
  • - Work alongside a high-performing, international engineering team in a global fintech unicorn
  • Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
high-throughput servingvLLMNVIDIA TensorRT-LLMsglangDockerKubernetesCDCApacheClickHousePython
Soft Skills
ownership mindsetcross-functional collaborationresponsibilityaccountabilitycommunication