Implement language and multimodal model inference as part of NVIDIA Inference Microservices (NIMs).
Contribute new features, fix bugs and deliver production code to TRT-LLM, NVIDIA’s open-source inference serving library.
Profile and analyze bottlenecks across the full inference stack to push the boundaries of inference performance.
Benchmark state-of-the-art offerings in various DL models inference and perform competitive analysis for NVIDIA SW/HW stack.
Collaborate heavily with other SW/HW co-design teams to enable the creation of the next generation of AI-powered services.

Requirements

PhD in CS, EE or CSEE or equivalent experience.
5+ years of experience.
Strong background in deep learning and neural networks, in particular inference.
Experience with performance profiling, analysis and optimization, especially for GPU-based applications.
Proficient in C++, PyTorch or equivalent frameworks.
Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Proven experience with processor and system-level performance optimization.
Deep understanding of modern LLM architectures.
Strong fundamentals in algorithms.
GPU programming experience (CUDA or OpenCL) is a plus

Benefits

equity
benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

C++PyTorchdeep learningneural networksperformance profilingperformance optimizationGPU programmingCUDAOpenCLalgorithms

Certifications

PhD in CSPhD in EEPhD in CSEE