Software Engineer L5, Offline Inference, Machine Learning Platform

Netflix

full-time

Posted on: 9/22/2025

Origin: • 🇺🇸 United States

✨ AI Apply

💰 $100,000 - $720,000 per year

Mid-LevelSenior

AWSCloudDockerJavaKubernetesPythonRayScala

About the role

Design, build, and operate next-generation systems that run large-scale batch inference workloads (from minutes to multi-day jobs).
Build developer-friendly APIs, SDKs, and CLIs that let researchers and engineers submit and manage batch inference jobs with minimal effort.
Design, implement, and operate distributed services that package, schedule, execute, and monitor batch inference workflows at massive scale.
Instrument the platform for reliability, debuggability, observability, and cost control; define SLOs and share an equitable on-call rotation.
Partner with content and studio ML practitioners to support model inference needs and workflows.
Foster a culture of engineering excellence through design reviews, mentorship, and candid, constructive feedback.

Hands-on experience with ML engineering or production systems involving training or inference of deep-learning models.
Proven track record of operating scalable infrastructure for ML workloads (batch or online).
Proficiency in one or more modern backend languages (e.g. Python, Java, Scala).
Production experience with containerization & orchestration (Docker, Kubernetes, ECS, etc.) and at least one major cloud provider (AWS preferred).
Commitment to operational best practices—observability, logging, incident response, and on-call excellence.
Comfortable with ambiguity and working across multiple layers of the tech stack to execute on both 0-to-1 and 1-to-100 projects.
Excellent written and verbal communication skills; effective collaboration across distributed teams and time zones.
Comfortable working in a team with peers and partners distributed across US geographies & time zones.
(Preferred) Deep understanding of real-world ML development workflows and close partnership with ML researchers or modeling engineers.
(Preferred) Familiarity with cloud-based AI/ML services (SageMaker, Bedrock, Databricks, OpenAI, Vertex) or open-source stacks (Ray, Kubeflow, MLflow).
(Preferred) Experience optimizing inference for large language models, computer-vision pipelines, or other foundation models (FSDP, tensor/pipeline parallelism, quantization, distillation).
(Preferred) Open-source contributions, patents, or public speaking/blogging on ML-infrastructure topics.