Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior Software Engineer, CUDA Deep Learning Systems

NVIDIA

Senior Software Engineer responsible for pioneering CUDA systems optimization for deep learning at NVIDIA. Collaborating on cutting-edge projects to enhance hardware performance across various architectures.

Posted 6/28/2026full-timeSanta Clara • California, Texas • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite

Tech Stack

Tools & technologies
Node.jsPython

About the role

Key responsibilities & impact
  • Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping.
  • Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster-scale supercomputing environments.
  • Design, implement, and optimize custom high-performance CUDA kernels tailored to emerging neural network architectures and workloads.
  • Analyze complex hardware-software interactions to identify and resolve performance bottlenecks in both training and inference pipelines.
  • Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and algorithms that improve accelerator compute utilization, memory bandwidth, cross-node network communication efficiency and programmability.
  • Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning.
  • Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.

Requirements

What you’ll need
  • BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
  • 8+ years of relevant industry experience or equivalent academic experience after degree achievement.
  • Strong proficiency in C++ and Python programming.
  • Solid background in the fundamentals of Deep Learning with a focus on transformers.
  • Strong understanding of distributed computing principles, multi-node scaling, and the unique performance challenges of cluster-scale execution.
  • Proven experience in systems programming, computer architecture, and low-level systems performance optimization.
  • Familiarity with deep learning accelerator architectures such as the GPU and hands-on experience with CUDA programming and kernel optimization.
  • A strong analytical approach with experience using profiling tools to deeply understand software performance on hardware.
  • Experience profiling and optimizing innovative vision models, generative AI architectures, or diffusion models.
  • Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile)

Benefits

Comp & perks
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Deep LearningCUDA Kernel OptimizationSystems ProgrammingComputer ArchitecturePerformance OptimizationProfiling ToolsTransformersMulti-Node ScalingVision ModelsGenerative AI Architectures
Soft Skills
Analytical ApproachCollaboration