Salary
💰 $100,000 - $720,000 per year
Tech Stack
AWSCloudDockerJavaKubernetesPythonRayScala
About the role
- Design, build, and operate next-generation systems that run large-scale batch inference workloads (from minutes to multi-day jobs).
- Build developer-friendly APIs, SDKs, and CLIs that let researchers and engineers submit and manage batch inference jobs with minimal effort.
- Design, implement, and operate distributed services that package, schedule, execute, and monitor batch inference workflows at massive scale.
- Instrument the platform for reliability, debuggability, observability, and cost control; define SLOs and share an equitable on-call rotation.
- Partner with content and studio ML practitioners to support model inference needs and workflows.
- Foster a culture of engineering excellence through design reviews, mentorship, and candid, constructive feedback.
Requirements
- Hands-on experience with ML engineering or production systems involving training or inference of deep-learning models.
- Proven track record of operating scalable infrastructure for ML workloads (batch or online).
- Proficiency in one or more modern backend languages (e.g. Python, Java, Scala).
- Production experience with containerization & orchestration (Docker, Kubernetes, ECS, etc.) and at least one major cloud provider (AWS preferred).
- Commitment to operational best practices—observability, logging, incident response, and on-call excellence.
- Comfortable with ambiguity and working across multiple layers of the tech stack to execute on both 0-to-1 and 1-to-100 projects.
- Excellent written and verbal communication skills; effective collaboration across distributed teams and time zones.
- Comfortable working in a team with peers and partners distributed across US geographies & time zones.
- (Preferred) Deep understanding of real-world ML development workflows and close partnership with ML researchers or modeling engineers.
- (Preferred) Familiarity with cloud-based AI/ML services (SageMaker, Bedrock, Databricks, OpenAI, Vertex) or open-source stacks (Ray, Kubeflow, MLflow).
- (Preferred) Experience optimizing inference for large language models, computer-vision pipelines, or other foundation models (FSDP, tensor/pipeline parallelism, quantization, distillation).
- (Preferred) Open-source contributions, patents, or public speaking/blogging on ML-infrastructure topics.