
Senior DL Algorithms Engineer – Inference Performance
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • United States
Visit company websiteExplore more
Salary
💰 $184,000 - $356,500 per year
Job Level
Tech Stack
About the role
- Implement language and multimodal model inference as part of NVIDIA Inference Microservices (NIMs).
- Contribute new features, fix bugs and deliver production code to TRT-LLM, NVIDIA’s open-source inference serving library.
- Profile and analyze bottlenecks across the full inference stack to push the boundaries of inference performance.
- Benchmark state-of-the-art offerings in various DL models inference and perform competitive analysis for NVIDIA SW/HW stack.
- Collaborate heavily with other SW/HW co-design teams to enable the creation of the next generation of AI-powered services.
Requirements
- PhD in CS, EE or CSEE or equivalent experience.
- 5+ years of experience.
- Strong background in deep learning and neural networks, in particular inference.
- Experience with performance profiling, analysis and optimization, especially for GPU-based applications.
- Proficient in C++, PyTorch or equivalent frameworks.
- Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
- Proven experience with processor and system-level performance optimization.
- Deep understanding of modern LLM architectures.
- Strong fundamentals in algorithms.
- GPU programming experience (CUDA or OpenCL) is a plus
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
C++PyTorchdeep learningneural networksperformance profilingperformance optimizationGPU programmingCUDAOpenCLalgorithms
Certifications
PhD in CSPhD in EEPhD in CSEE