Build the off-vehicle inference service powering our Foundational models (LLMs & VLMs) and the models that improve our rider experiences.
Lead the design, implementation, and operation of a robust and efficient ML serving infrastructure to enable the serving and monitoring of ML models.
Collaborate closely with cross-functional teams, including ML researchers, software engineers, and data engineers, to define requirements and align on architectural decisions.
Enable the junior engineers in the team to grow their careers by providing technical guidance and mentorship

Requirements

4+ years of ML model serving infrastructure experience
Experience building large-scale model serving using GPU and/or high QPS, low latency serving use cases.
Experience with GPU-accelerated inference using RayServe, vLLM, TensorRT, Nvidia Triton, or PyTorch.
Experience working with cloud providers like AWS and working with K8s

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

ML model serving infrastructureGPU-accelerated inferenceRayServevLLMTensorRTNvidia TritonPyTorchhigh QPS servinglow latency servingcloud computing

Soft Skills

technical guidancementorshipcollaborationcross-functional teamworkarchitectural decision-making