Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior Production Engineer – DGX Cloud

NVIDIA

Senior Production Engineer at NVIDIA responsible for advancing scalable AI infrastructure solutions. Supporting production systems for GPU clusters and enhancing reliability across AI workloads.

Posted 5/19/2026full-timeRemote • California, Colorado, North Carolina, Texas, Washington • 🇺🇸 United StatesSenior💰 $168,000 - $333,500 per yearWebsite

Tech Stack

Tools & technologies
CloudGoPython

About the role

Key responsibilities & impact
  • You will be part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety of AI workloads.
  • This includes working on custom software related to GPU asset provisioning, configuration, and lifecycle management across cloud providers.
  • Implementing monitoring and health management capabilities that enable industry leading reliability, availability, and scalability of GPU assets.
  • You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry.
  • Working with teams across NVIDIA to ensure production AI clusters run reliability and consistently with maximum performance.
  • Evaluating system failures and improving services based on a well-defined incident management process.

Requirements

What you’ll need
  • Direct experience in a Production Engineering/DevOps/SRE role within a highly technical organization with demonstrable impact from your work.
  • Highly motivated with strong communication skills, you can work successfully with multi-functional teams, principles, and architects and coordinate effectively across organizational boundaries and geographies.
  • 8+ years in similar role and experience on large-scale production systems.
  • Experience with the aforementioned Production Engineering/DevOps/SRE principles, tools and techniques.
  • You possess a BS in Computer Science, Engineering, Physics, Mathematics or a comparable Degree or equivalent experience.
  • Technical knowledge, including a systems programming language (Go, Python) and a solid understanding of data structures and algorithms.

Benefits

Comp & perks
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GPU asset provisioningconfiguration managementlifecycle managementmonitoring capabilitieshealth managementdata structuresalgorithmsGoPython
Soft Skills
strong communication skillsteam collaborationcoordinationmotivation
Certifications
BS in Computer ScienceBS in EngineeringBS in PhysicsBS in Mathematics