Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
KAYAK

Senior ML Ops Engineer

KAYAK

. Build and maintain ML infrastructure end-to-end: Extend and operate the infrastructure that powers every model we ship — including CI/CD pipelines, model orchestration, and automated training pipelines designed to scale reliably without manual intervention.

Posted 7/3/2026full-time🇩🇪 GermanySeniorWebsite

Tech Stack

Tools & technologies
DockerGrafanaKubernetesLinuxPrometheusPython

About the role

Key responsibilities & impact
  • Build and maintain ML infrastructure end-to-end: Extend and operate the infrastructure that powers every model we ship — including CI/CD pipelines, model orchestration, and automated training pipelines designed to scale reliably without manual intervention.
  • Own model deployment and serving: Help define and evolve the standards and tooling for model serving, ensuring low latency and high availability across our ML services.
  • Develop core MLOps capabilities: Establish and maintain essential infrastructure that functions as reliable, self-service systems for the entire machine learning organization — with a focus on feature stores, model registries, and automated monitoring for performance and data drift.
  • Operationalize infrastructure for the ML team: Collaborate with Operations to enable Kubernetes (k8s) autoscaling and GPU provisioning, turning these into accessible, self-service tools for ML practitioners — including standing up and operating a Kubernetes-based development cluster and taking models from experimentation to GPU-backed production.
  • Improve platform reliability and performance: Partner with Operations to design resilient monitoring using advanced observability tooling. Define service-level objectives and implement automation to reduce manual interventions and improve system reliability.
  • Empower Data Scientists through standardized, optimized workflows: Amplify the impact of the ML team by building clear, well-supported "golden paths" — standardized workflows that streamline the model development lifecycle and let Data Scientists focus on modeling while you handle the infrastructure.

Requirements

What you’ll need
  • Experience building and operating ML platforms in production environments.
  • Solid working knowledge of containerization and orchestration (Docker, Kubernetes), Linux internals, and model serving at scale.
  • Familiarity with ML lifecycle tooling, including orchestration frameworks, feature stores, model registries, and drift or performance monitoring.
  • Experience owning production systems: defining service-level objectives (SLOs), building observability (for example, using tools such as Prometheus, Grafana, or Datadog), participating in incident response, and diagnosing large-scale failures systematically. You look for opportunities to automate repetitive work rather than absorb it.
  • Comfort writing production-quality code in Python or a comparable language.
  • Experience modernizing production infrastructure with attention to reliability, risk, and cost — including thoughtful sequencing of work to maintain availability and continuity for live systems.
  • The ability to take ownership of technical outcomes, advocate for decisions using data, and communicate clearly in writing and in person — to both technical and non-technical audiences.

Benefits

Comp & perks
  • Work from (almost) anywhere for up to 20 days per year
  • Focus on mental health and well-being:
  • Company-paid therapy sessions through SpringHealth
  • Company-paid subscription to HeadSpace
  • Company-wide week off a year – the whole team fully recharges (and returns without a pile-up of work!)
  • No meeting Fridays
  • Paid parental leave
  • Paid volunteer time
  • Focus on your career growth:
  • Development Dollars
  • Leadership development
  • Access to thousands of on-demand e-learnings
  • Travel Discounts
  • Employee Resource Groups
  • 6 weeks paid vacation + a day off for your birthday
  • Free lunch 2 days per week
  • Pension plan contributions
  • Public transportation subsidies
  • Bike leasing program
  • Monthly social events, Thursday happy hours, sports teams
  • An awesome office in Friedrichshain, Berlin

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
ML InfrastructureCI/CD PipelinesModel OrchestrationContainerizationOrchestration FrameworksFeature StoresModel RegistriesPerformance MonitoringService-Level Objectives (SLOs)Production-Quality Code
Soft Skills
Clear CommunicationOwnership of Technical OutcomesAdvocacy Using Data