MLOps Engineer

White Circle

MLOps Engineer integrating and maintaining AI models in production for an AI Safety company. Overseeing deployment pipelines and ensuring model quality and performance metrics.

Posted 6/29/2026full-timeParis • 🇫🇷 FranceMid-LevelSenior💰 $100,000 - $200,000 per yearWebsite

Tech Stack

Tools & technologies

KubernetesNode.jsRust

About the role

Key responsibilities & impact

Integrate new text and multimodal models into our serving paths and verify they behave correctly under production-like traffic.
Build and maintain rollout pipelines for frequent model releases.
Create smoke, quality, and performance gates for model promotion.
Operate local and cluster GPU deployments on Kubernetes.
Build dashboards for latency, throughput, queue depth, GPU usage, fallback rate, and quality drift.
Run A/B and canary rollouts for model, prompt, routing, and serving config changes.
Debug production issues across model config, tokenizer, serving API, router, queue, Kubernetes, GPU runtime, and CI jobs.
Optimize serving cost and reliability across mixed GPU capacity.

Requirements

What you’ll need

Experience with an inference serving engine such as SGLang, vLLM, Dynamo, or TensorRT-LLM, and a working understanding of the request lifecycle through gateway, router, frontend, worker, queue, and model engine.
Solid Kubernetes GPU experience: NVIDIA device plugin, GPU scheduling, resource requests/limits, node affinity, taints, tolerations, and node pools.
Understanding of multi-node communication libraries and kernels, CUDA runtime, and container runtime compatibility, and the ability to debug across those layers.
Ability to design and implement CI/CD for model serving: image and config versioning, smoke tests, quality regression tests against benchmarks, latency/throughput gates, canary rollout, and rollback.
Strong observability instincts — you can define the dashboards and alerts that decide whether a model gets promoted or rolled back (p50/p95/p99 latency, TTFT, TPOT, queue depth, GPU utilization/memory, error/timeout/OOM rates, fallback rate, route distribution, canary vs. baseline, cost per successful request).
Production debugging across the whole stack from Rust to k8s configs.
Clear communication of engineering tradeoffs.

Benefits

Comp & perks

Paid time off in line with your local regulations, no matter where you work from
Work from Paris (hybrid) with a relocation package available, or work from London (note: we are unable to provide relocation support for London-based roles)
Comprehensive medical insurance for our France-based team (please note that we are in the process of setting up our UK office and therefore cannot offer medical insurance for London-based roles yet)
All the hardware, tools, and services you need
Covered subscriptions for AI agents and IDEs
Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Model IntegrationKubernetesCI/CDCUDA RuntimeDebuggingPerformance OptimizationQuality Regression TestingA/B TestingMultimodal ModelsResource Management

Soft Skills

Clear CommunicationProblem-Solving