
Manager, Large Language Model Inference
NVIDIA
full-time
Posted on:
Location Type: Hybrid
Location: Santa Clara • California • United States
Visit company websiteExplore more
Salary
💰 $184,000 - $356,500 per year
Tech Stack
About the role
- Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference
- Drive the design, development, and delivery of production inference software, targeting NVIDIA's next-generation enterprise and edge hardware platforms
- Integrating cutting-edge technologies developed at NVIDIA and offering an intuitive developer experience for LLM deployment
- Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination
Requirements
- MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field
- 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience
- Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups
- Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries
- Demonstrated expertise in large language models (LLM) and/or vision language models (VLM)
Benefits
- health insurance
- retirement plans
- paid time off
- flexible work arrangements
- professional development
- bonuses
- stock options
- equity
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
C++Pythonsoftware designproduction-quality softwarekernel developmentruntime optimizationsLLM inferencesoftware librarieslarge language modelsvision language models
Soft Skills
leadershipproject planningmilestone deliverycross-functional coordinationteam buildingcommunicationscaling teamscollaboration
Certifications
MS in Computer SciencePhD in Computer Scienceequivalent experience in Computer Engineeringequivalent experience in AI