FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

MLOps Engineer
White CircleMLOps Engineer integrating and maintaining AI models in production for an AI Safety company. Overseeing deployment pipelines and ensuring model quality and performance metrics.
Tech Stack
Tools & technologiesKubernetesNode.jsRust
About the role
Key responsibilities & impact- Integrate new text and multimodal models into our serving paths and verify they behave correctly under production-like traffic.
- Build and maintain rollout pipelines for frequent model releases.
- Create smoke, quality, and performance gates for model promotion.
- Operate local and cluster GPU deployments on Kubernetes.
- Build dashboards for latency, throughput, queue depth, GPU usage, fallback rate, and quality drift.
- Run A/B and canary rollouts for model, prompt, routing, and serving config changes.
- Debug production issues across model config, tokenizer, serving API, router, queue, Kubernetes, GPU runtime, and CI jobs.
- Optimize serving cost and reliability across mixed GPU capacity.
Requirements
What you’ll need- Experience with an inference serving engine such as SGLang, vLLM, Dynamo, or TensorRT-LLM, and a working understanding of the request lifecycle through gateway, router, frontend, worker, queue, and model engine.
- Solid Kubernetes GPU experience: NVIDIA device plugin, GPU scheduling, resource requests/limits, node affinity, taints, tolerations, and node pools.
- Understanding of multi-node communication libraries and kernels, CUDA runtime, and container runtime compatibility, and the ability to debug across those layers.
- Ability to design and implement CI/CD for model serving: image and config versioning, smoke tests, quality regression tests against benchmarks, latency/throughput gates, canary rollout, and rollback.
- Strong observability instincts — you can define the dashboards and alerts that decide whether a model gets promoted or rolled back (p50/p95/p99 latency, TTFT, TPOT, queue depth, GPU utilization/memory, error/timeout/OOM rates, fallback rate, route distribution, canary vs. baseline, cost per successful request).
- Production debugging across the whole stack from Rust to k8s configs.
- Clear communication of engineering tradeoffs.
Benefits
Comp & perks- Paid time off in line with your local regulations, no matter where you work from
- Work from Paris (hybrid) with a relocation package available, or work from London (note: we are unable to provide relocation support for London-based roles)
- Comprehensive medical insurance for our France-based team (please note that we are in the process of setting up our UK office and therefore cannot offer medical insurance for London-based roles yet)
- All the hardware, tools, and services you need
- Covered subscriptions for AI agents and IDEs
- Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Model IntegrationKubernetesCI/CDCUDA RuntimeDebuggingPerformance OptimizationQuality Regression TestingA/B TestingMultimodal ModelsResource Management
Soft Skills
Clear CommunicationProblem-Solving