FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Production Engineer – DGX Cloud
NVIDIASenior Production Engineer at NVIDIA responsible for advancing scalable AI infrastructure solutions. Supporting production systems for GPU clusters and enhancing reliability across AI workloads.
Posted 5/19/2026full-timeRemote • California, Colorado, North Carolina, Texas, Washington • 🇺🇸 United StatesSenior💰 $168,000 - $333,500 per yearWebsite
Tech Stack
Tools & technologiesCloudGoPython
About the role
Key responsibilities & impact- You will be part of an DGX Cloud team responsible for production systems that enable large scalable GPU clusters to be used for a variety of AI workloads.
- This includes working on custom software related to GPU asset provisioning, configuration, and lifecycle management across cloud providers.
- Implementing monitoring and health management capabilities that enable industry leading reliability, availability, and scalability of GPU assets.
- You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry.
- Working with teams across NVIDIA to ensure production AI clusters run reliability and consistently with maximum performance.
- Evaluating system failures and improving services based on a well-defined incident management process.
Requirements
What you’ll need- Direct experience in a Production Engineering/DevOps/SRE role within a highly technical organization with demonstrable impact from your work.
- Highly motivated with strong communication skills, you can work successfully with multi-functional teams, principles, and architects and coordinate effectively across organizational boundaries and geographies.
- 8+ years in similar role and experience on large-scale production systems.
- Experience with the aforementioned Production Engineering/DevOps/SRE principles, tools and techniques.
- You possess a BS in Computer Science, Engineering, Physics, Mathematics or a comparable Degree or equivalent experience.
- Technical knowledge, including a systems programming language (Go, Python) and a solid understanding of data structures and algorithms.
Benefits
Comp & perks- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU asset provisioningconfiguration managementlifecycle managementmonitoring capabilitieshealth managementdata structuresalgorithmsGoPython
Soft Skills
strong communication skillsteam collaborationcoordinationmotivation
Certifications
BS in Computer ScienceBS in EngineeringBS in PhysicsBS in Mathematics