NVIDIA

Senior Software Engineer – TensorRT, Edge-LLM

NVIDIA

full-time

Posted on:

Location Type: Hybrid

Location: Santa ClaraCaliforniaTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,000 - $287,500 per year

Job Level

Tech Stack

About the role

  • Develop and evolve a state-of-the-art inference framework in modern C++ that extends TensorRT with autoregressive model serving capabilities, including speculative decoding, LoRA, MoE, and KV cache management.
  • Design and implement compiler and runtime optimizations tailored for transformer-based models running on constrained, real-time platforms.
  • Collaborate with teams across CUDA, kernel libraries, compilers, and robotics to deliver high-performance, production-ready solutions.
  • Contribute to CUDA kernel and operator development for critical transformer components such as attention, GEMM, and MoE.
  • Benchmark, profile, and optimize inference performance across diverse embedded and automotive environments.
  • Stay ahead of the rapidly evolving LLM/VLM ecosystem and bring emerging techniques into product-grade software.

Requirements

  • BS, MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a closely related field.
  • 4+ years of relevant software development experience.
  • Deep understanding of transformer models and inference optimization techniques (e.g., quantization, tensor parallelism, or memory-efficient scheduling).
  • Proficient programming ability with modern C++ (C++11/14/17 and beyond).
  • Familiarity with popular LLM frameworks and libraries such as TensorRT, TensorRT-LLM, vLLM, SGLang, MLC-LLM, or FlashInfer.
  • A track record of strong software design, execution, and collaboration across fields.
Benefits
  • eligible for equity and benefits
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
C++TensorRTtransformer modelsinference optimizationquantizationtensor parallelismmemory-efficient schedulingcompiler optimizationsruntime optimizationsbenchmarking
Soft Skills
collaborationsoftware designexecution
Certifications
BS in Computer ScienceMS in Computer SciencePhD in Computer ScienceBS in Electrical/Computer EngineeringMS in Electrical/Computer EngineeringPhD in Electrical/Computer Engineering