
Principal Software Engineer – AI Inference
NVIDIA
full-time
Posted on:
Location Type: Hybrid
Location: Santa Clara • California • United States
Visit company websiteExplore more
Salary
💰 $272,000 - $431,250 per year
Job Level
Tech Stack
About the role
- Drive upstream-first engineering in vLLM/SGLang: author and land PRs or equivalent experience, engage in development discussions, help compose roadmaps, and build durable maintainer relationships.
- Build and implement inference-runtime features that improve efficiency, latency, and tail behavior: request scheduling, batching policies, KV-cache management (paging/sharding), memory planning, and streaming.
- Optimize core hot paths across the stack—from Python orchestration down to C++/CUDA kernels—using profiling and measurement to guide decisions.
- Improve multi-GPU and multi-node inference: communication patterns, parallelism strategies (tensor/sequence/pipeline), and system-level scaling/efficiency.
- Strengthen correctness, robustness, and operability: determinism where needed, graceful degradation, backpressure, observability hooks, and performance regression testing.
- Collaborate across NVIDIA to integrate upstream advances with production needs (deployment patterns, compatibility, security posture) while keeping changes broadly adoptable by the community.
- Mentor senior engineers, raise the technical bar through build reviews, and establish guidelines for performance engineering and upstream contribution workflows.
Requirements
- 15+ years building production software with significant depth in systems engineering
- strong track record of owning ambiguous, high-impact technical problems end-to-end
- demonstrated expertise in LLM inference/serving systems (e.g., vLLM, SGLang) and the tradeoffs that drive real production performance
- strong programming skills in Rust, C++, Python, CUDA; ability to read, modify, and optimize performance-critical code across layers
- experience with GPU performance analysis tools and methodologies (profiling, microbenchmarking, memory/comms analysis) and a strong measurement culture
- solid foundation in distributed systems and concurrency: queues/schedulers, RPC/streaming, multi-process/multi-threaded runtime behavior, and scaling patterns across nodes
- excellent communication skills; ability to influence across teams and represent NVIDIA well in open-source technical forums
- BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience)
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonC++RustCUDALLM inferenceSGLangprofilingmicrobenchmarkingmemory analysisdistributed systems
Soft Skills
communicationmentoringinfluencingcollaborationproblem-solvingtechnical leadershipguideline establishmentperformance engineering
Certifications
BS in Computer ScienceMS in Computer ScienceBS in Computer EngineeringMS in Computer Engineering