Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Director, Site Reliability and Software Engineering – DGX Cloud

NVIDIA

Site Reliability and Software Engineering leader managing scalable systems at NVIDIA's DGX Cloud. Overseeing engineering teams and driving technical project success in a fast-paced environment.

Posted 5/4/2026full-timeRemote • California • 🇺🇸 United StatesLead💰 $320,000 - $575,000 per yearWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsLinuxSDLCUnix

About the role

Key responsibilities & impact
  • Manage a team of Software and Site Reliability engineers, including program development, task planning and code reviews.
  • Define team strategy and roadmap, and drive adoption of scalable SDLC practices, test infrastructure, and modern practices Nvidia’s DGX Cloud Computing environment.
  • Drive technical projects and provide leadership in an innovative and fast-paced environment.
  • Be responsible for the overall planning, tracking and success of technical projects.
  • Work closely with project and product management teams to ensure best-in-class product development.
  • Contribute technically to the technical projects for DGX Cloud Computing Services.
  • Interact with key internal stakeholders to provide operational and financial clarity on technical spend.
  • Lead efforts related to executive reporting, dashboards, and operational CTO metrics focusing on continuous improvement and evolution to maximize decision making and executive visibility.

Requirements

What you’ll need
  • 12+ overall years of Experience in engineering management
  • 5+ years of leadership
  • Bachelor / Master degree in Computer Science, or equivalent experience
  • Experience in designing and implementing large-scale distributed systems
  • Experience in Containers / Virtualization environments/ Cluster solutions
  • Experience in managing Technical Support / DevOps teams
  • Strong knowledge in Unix/Linux
  • Demonstrated people management and leadership skills, the proven track record of mentoring and coaching team members
  • Ability to quickly learn and evaluate new technologies
  • Ability to influence and establish relationships with other software and IT functional groups such as development, server, storage and security teams.

Benefits

Comp & perks
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
software engineering managementsite reliability engineeringprogram developmenttask planningcode reviewsSDLC practicestest infrastructurelarge-scale distributed systemscontainersvirtualization
Soft Skills
leadershippeople managementmentoringcoachinginfluencingrelationship buildingcommunicationstrategic planningcontinuous improvementdecision making
Certifications
Bachelor degree in Computer ScienceMaster degree in Computer Science