Salary
💰 $184,000 - $356,500 per year
Tech Stack
D3.jsJavaScriptPythonPyTorch
About the role
- Characterize the latest LLMs and inference servers like vLLM and SGLang to ensure TRT-LLM maintains leadership
- Build engaging content with performance marketing (blog posts and written materials) highlighting TRT-LLM achievements
- Collaborate with engineers from AI startups to debug and establish standard methodologies
- Profile GPU kernel-level performance to identify hardware and software optimization opportunities
- Develop profiling and analysis software tools to keep up with rapid network scaling
- Contribute to deep learning software projects (PyTorch, TRT-LLM, vLLM, SGLang)
- Verify TRT-LLM performance for new GPU product launches
- Collaborate across software, research, and product teams to guide direction of inference serving
Requirements
- Master's or PhD degree in Computer Science, Computer Engineering, or related fields, or equivalent experience
- 6+ years of relevant industry experience
- Detailed knowledge of deep learning inference serving, PyTorch programming, profiling, and compiler optimizations
- Proficiency in Python and C++ programming languages and familiarity with CUDA
- Experience with LLMs and their performance challenges and opportunities
- Solid understanding of CPU and GPU microarchitecture and performance characteristics
- Experience with complex software projects like frameworks, compilers, or operating systems
- Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment
- (Ways to stand out) Drive to continuously improve software and hardware performance
- (Nice to have) Examples of novel use cases for agentic AI tools in the workplace
- (Nice to have) Experience with database and visualization tools like D3.js