Act as a technical leader on the team, driving the end-to-end design, development, and delivery of critical data plane components
Architect and refine system design proposals for our high-scale AI inference cloud ecosystem
Implement and optimize distributed inference hosting using tensor/data parallelism and smart routing
Work cross-functionally with Product Managers and other teams to align technical roadmaps
Coach and mentor junior engineers
Maintain and operate critical services, utilizing observability tools and defining SLOs

Requirements

Strong experience with microservices, messaging systems, databases, and infrastructure as code
Hands-on experience hosting large language or multimodal models using inference engines like vLLM, SGLang, or Modular
Familiarity with distributed inference serving frameworks such as llm-d, NVIDIA Dynamo, or Ray Serve
Understanding of GPU-level optimization and experience with interconnect technologies like NVlink, XGMI, or RoCE
Knowledge of common LLM architectures and optimization techniques (e.g., continuous batching, quantization)
Expert-level proficiency in GoLang or Python and familiarity with gRPC
Proven experience shipping customer-facing software products and running critical services in a high-scale environment
Experience integrating and building with open-source software

Benefits

Competitive salary
Professional development resources for conferences and education reimbursement
Access to LinkedIn Learning courses
Employee Assistance Program
Flexible time off policy
Local Employee Meetups

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

microservicesmessaging systemsdatabasesinfrastructure as codelarge language modelsinference enginesdistributed inference serving frameworksGPU-level optimizationGoLangPython

Soft Skills

technical leadershipcoachingmentoringcross-functional collaboration