Senior Performance Software Engineer, Deep Learning Libraries

NVIDIA

full-time

Posted on: 10/20/2025

Location Type: Hybrid

Location: Santa Clara • California, North Carolina, Oregon, Texas, Washington • 🇺🇸 United States

✨ AI Apply

💰 $184,000 - $356,500 per year

Senior

Assembly

About the role

Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)
Following general software engineering best practices including support for regression testing and CI/CD flows
Collaborating with teams across NVIDIA: CUDA compiler team on generating optimal assembly code
Deep learning training and inference performance teams on which layers require optimization
Hardware and architecture teams on the programming model for new deep learning hardware features

Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
6+ years of relevant industry experience
Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design
Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g. with OpenMP or pthreads)
Solid understanding of computer architecture and some experience with assembly programming

Benefits

equity
benefits 📊 Resume Score Upload your resume to see if it passes auto-rejection tools used by recruiters Check Resume Score

Tip: use these terms in your resume and cover letter to boost ATS matches.

C++CUDAdeep learningmatrix multiplicationconvolutionsnormalizationsparallel programmingOpenMPpthreadsassembly programming

collaborationproblem-solvingdebuggingperformance analysistest design

Masters degreePhD