
Principal Machine Learning Engineer, Distributed vLLM Inference
Red Hat
full-time
Posted on:
Location Type: Hybrid
Location: Boston • Massachusetts • United States
Visit company websiteExplore more
Salary
💰 $189,600 - $312,730 per year
Job Level
About the role
- Develop and maintain distributed inference infrastructure leveraging Kubernetes APIs, operators, and the Gateway Inference Extension API for scalable LLM deployments.
- Create system components in Go and/or Rust to integrate with the vLLM project and manage distributed inference workloads.
- Design and implement KV cache-aware routing and scoring algorithms to optimize memory utilization and request distribution in large-scale inference deployments.
- Enhance the resource utilization, fault tolerance, and stability of the inference stack.
- Contribute to the design, development, and testing of various inference optimization algorithms.
- Actively participate in technical design discussions and propose innovative solutions to complex challenges.
- Provide timely and constructive code reviews.
- Mentor and guide fellow engineers, fostering a culture of continuous learning and innovation.
Requirements
- Strong proficiency in Python, GoLang and at least one of the following: Rust, or C++.
- Experience with cloud-native Kubernetes service mesh technologies/stacks such as Istio, Cilium, Envoy (WASM filters), and CNI.
- A solid understanding of Layer 7 networking, HTTP/2, gRPC, and the fundamentals of API gateways and reverse proxies.
- Working knowledge of high-performance networking protocols and technologies including UCX, RoCE, InfiniBand, and RDMA is a plus.
- Excellent communication skills, capable of interacting effectively with both technical and non-technical team members.
- A Bachelor's or Master's degree in computer science, computer engineering, or a related field.
- Following is considered a plus
- Experience with the Kubernetes ecosystem, including core concepts, custom APIs, operators, and the Gateway API inference extension for GenAI workloads.
- Experience with GPU performance benchmarking and profiling tools like NVIDIA Nsight or distributed tracing libraries/techniques like OpenTelemetry.
- Ph.D. in an ML-related domain is a significant advantage
Benefits
- Comprehensive medical, dental, and vision coverage
- Flexible Spending Account - healthcare and dependent care
- Health Savings Account - high deductible medical plan
- Retirement 401(k) with employer match
- Paid time off and holidays
- Paid parental leave plans for all new parents
- Leave benefits including disability, paid family medical leave, and paid military leave
- Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGoLangRustC++KubernetesAPI gatewaysgRPCHTTP/2KV cache-aware routinginference optimization algorithms
Soft Skills
communicationmentoringcollaborationproblem-solvinginnovation
Certifications
Bachelor's degree in computer scienceMaster's degree in computer engineeringPh.D. in ML-related domain