NVIDIA

Principal Software Engineer – AI Inference

NVIDIA

full-time

Posted on:

Location Type: Hybrid

Location: Santa ClaraCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $272,000 - $431,250 per year

Job Level

About the role

  • Drive upstream-first engineering in vLLM/SGLang: author and land PRs or equivalent experience, engage in development discussions, help compose roadmaps, and build durable maintainer relationships.
  • Build and implement inference-runtime features that improve efficiency, latency, and tail behavior: request scheduling, batching policies, KV-cache management (paging/sharding), memory planning, and streaming.
  • Optimize core hot paths across the stack—from Python orchestration down to C++/CUDA kernels—using profiling and measurement to guide decisions.
  • Improve multi-GPU and multi-node inference: communication patterns, parallelism strategies (tensor/sequence/pipeline), and system-level scaling/efficiency.
  • Strengthen correctness, robustness, and operability: determinism where needed, graceful degradation, backpressure, observability hooks, and performance regression testing.
  • Collaborate across NVIDIA to integrate upstream advances with production needs (deployment patterns, compatibility, security posture) while keeping changes broadly adoptable by the community.
  • Mentor senior engineers, raise the technical bar through build reviews, and establish guidelines for performance engineering and upstream contribution workflows.

Requirements

  • 15+ years building production software with significant depth in systems engineering
  • strong track record of owning ambiguous, high-impact technical problems end-to-end
  • demonstrated expertise in LLM inference/serving systems (e.g., vLLM, SGLang) and the tradeoffs that drive real production performance
  • strong programming skills in Rust, C++, Python, CUDA; ability to read, modify, and optimize performance-critical code across layers
  • experience with GPU performance analysis tools and methodologies (profiling, microbenchmarking, memory/comms analysis) and a strong measurement culture
  • solid foundation in distributed systems and concurrency: queues/schedulers, RPC/streaming, multi-process/multi-threaded runtime behavior, and scaling patterns across nodes
  • excellent communication skills; ability to influence across teams and represent NVIDIA well in open-source technical forums
  • BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience)
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonC++RustCUDALLM inferenceSGLangprofilingmicrobenchmarkingmemory analysisdistributed systems
Soft Skills
communicationmentoringinfluencingcollaborationproblem-solvingtechnical leadershipguideline establishmentperformance engineering
Certifications
BS in Computer ScienceMS in Computer ScienceBS in Computer EngineeringMS in Computer Engineering