Senior Deep Learning Architect, LLM Inference

NVIDIA

full-time

Posted on: 9/25/2025

Origin: • 🇺🇸 United States • California

✨ AI Apply

💰 $184,000 - $356,500 per year

Senior

D3.jsJavaScriptPythonPyTorch

About the role

Characterize the latest LLMs and inference servers like vLLM and SGLang to ensure TRT-LLM maintains leadership
Build engaging content with performance marketing (blog posts and written materials) highlighting TRT-LLM achievements
Collaborate with engineers from AI startups to debug and establish standard methodologies
Profile GPU kernel-level performance to identify hardware and software optimization opportunities
Develop profiling and analysis software tools to keep up with rapid network scaling
Contribute to deep learning software projects (PyTorch, TRT-LLM, vLLM, SGLang)
Verify TRT-LLM performance for new GPU product launches
Collaborate across software, research, and product teams to guide direction of inference serving

Master's or PhD degree in Computer Science, Computer Engineering, or related fields, or equivalent experience
6+ years of relevant industry experience
Detailed knowledge of deep learning inference serving, PyTorch programming, profiling, and compiler optimizations
Proficiency in Python and C++ programming languages and familiarity with CUDA
Experience with LLMs and their performance challenges and opportunities
Solid understanding of CPU and GPU microarchitecture and performance characteristics
Experience with complex software projects like frameworks, compilers, or operating systems
Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment
(Ways to stand out) Drive to continuously improve software and hardware performance
(Nice to have) Examples of novel use cases for agentic AI tools in the workplace
(Nice to have) Experience with database and visualization tools like D3.js