FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior ML Ops Engineer
KAYAK. Build and maintain ML infrastructure end-to-end: Extend and operate the infrastructure that powers every model we ship — including CI/CD pipelines, model orchestration, and automated training pipelines designed to scale reliably without manual intervention.
Tech Stack
Tools & technologiesDockerGrafanaKubernetesLinuxPrometheusPython
About the role
Key responsibilities & impact- Build and maintain ML infrastructure end-to-end: Extend and operate the infrastructure that powers every model we ship — including CI/CD pipelines, model orchestration, and automated training pipelines designed to scale reliably without manual intervention.
- Own model deployment and serving: Help define and evolve the standards and tooling for model serving, ensuring low latency and high availability across our ML services.
- Develop core MLOps capabilities: Establish and maintain essential infrastructure that functions as reliable, self-service systems for the entire machine learning organization — with a focus on feature stores, model registries, and automated monitoring for performance and data drift.
- Operationalize infrastructure for the ML team: Collaborate with Operations to enable Kubernetes (k8s) autoscaling and GPU provisioning, turning these into accessible, self-service tools for ML practitioners — including standing up and operating a Kubernetes-based development cluster and taking models from experimentation to GPU-backed production.
- Improve platform reliability and performance: Partner with Operations to design resilient monitoring using advanced observability tooling. Define service-level objectives and implement automation to reduce manual interventions and improve system reliability.
- Empower Data Scientists through standardized, optimized workflows: Amplify the impact of the ML team by building clear, well-supported "golden paths" — standardized workflows that streamline the model development lifecycle and let Data Scientists focus on modeling while you handle the infrastructure.
Requirements
What you’ll need- Experience building and operating ML platforms in production environments.
- Solid working knowledge of containerization and orchestration (Docker, Kubernetes), Linux internals, and model serving at scale.
- Familiarity with ML lifecycle tooling, including orchestration frameworks, feature stores, model registries, and drift or performance monitoring.
- Experience owning production systems: defining service-level objectives (SLOs), building observability (for example, using tools such as Prometheus, Grafana, or Datadog), participating in incident response, and diagnosing large-scale failures systematically. You look for opportunities to automate repetitive work rather than absorb it.
- Comfort writing production-quality code in Python or a comparable language.
- Experience modernizing production infrastructure with attention to reliability, risk, and cost — including thoughtful sequencing of work to maintain availability and continuity for live systems.
- The ability to take ownership of technical outcomes, advocate for decisions using data, and communicate clearly in writing and in person — to both technical and non-technical audiences.
Benefits
Comp & perks- Work from (almost) anywhere for up to 20 days per year
- Focus on mental health and well-being:
- Company-paid therapy sessions through SpringHealth
- Company-paid subscription to HeadSpace
- Company-wide week off a year – the whole team fully recharges (and returns without a pile-up of work!)
- No meeting Fridays
- Paid parental leave
- Paid volunteer time
- Focus on your career growth:
- Development Dollars
- Leadership development
- Access to thousands of on-demand e-learnings
- Travel Discounts
- Employee Resource Groups
- 6 weeks paid vacation + a day off for your birthday
- Free lunch 2 days per week
- Pension plan contributions
- Public transportation subsidies
- Bike leasing program
- Monthly social events, Thursday happy hours, sports teams
- An awesome office in Friedrichshain, Berlin
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
ML InfrastructureCI/CD PipelinesModel OrchestrationContainerizationOrchestration FrameworksFeature StoresModel RegistriesPerformance MonitoringService-Level Objectives (SLOs)Production-Quality Code
Soft Skills
Clear CommunicationOwnership of Technical OutcomesAdvocacy Using Data