FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Director, Site Reliability and Software Engineering – DGX Cloud
NVIDIASite Reliability and Software Engineering leader managing scalable systems at NVIDIA's DGX Cloud. Overseeing engineering teams and driving technical project success in a fast-paced environment.
Posted 5/4/2026full-timeRemote • California • 🇺🇸 United StatesLead💰 $320,000 - $575,000 per yearWebsite
Tech Stack
Tools & technologiesCloudDistributed SystemsLinuxSDLCUnix
About the role
Key responsibilities & impact- Manage a team of Software and Site Reliability engineers, including program development, task planning and code reviews.
- Define team strategy and roadmap, and drive adoption of scalable SDLC practices, test infrastructure, and modern practices Nvidia’s DGX Cloud Computing environment.
- Drive technical projects and provide leadership in an innovative and fast-paced environment.
- Be responsible for the overall planning, tracking and success of technical projects.
- Work closely with project and product management teams to ensure best-in-class product development.
- Contribute technically to the technical projects for DGX Cloud Computing Services.
- Interact with key internal stakeholders to provide operational and financial clarity on technical spend.
- Lead efforts related to executive reporting, dashboards, and operational CTO metrics focusing on continuous improvement and evolution to maximize decision making and executive visibility.
Requirements
What you’ll need- 12+ overall years of Experience in engineering management
- 5+ years of leadership
- Bachelor / Master degree in Computer Science, or equivalent experience
- Experience in designing and implementing large-scale distributed systems
- Experience in Containers / Virtualization environments/ Cluster solutions
- Experience in managing Technical Support / DevOps teams
- Strong knowledge in Unix/Linux
- Demonstrated people management and leadership skills, the proven track record of mentoring and coaching team members
- Ability to quickly learn and evaluate new technologies
- Ability to influence and establish relationships with other software and IT functional groups such as development, server, storage and security teams.
Benefits
Comp & perks- Equity
- Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
software engineering managementsite reliability engineeringprogram developmenttask planningcode reviewsSDLC practicestest infrastructurelarge-scale distributed systemscontainersvirtualization
Soft Skills
leadershippeople managementmentoringcoachinginfluencingrelationship buildingcommunicationstrategic planningcontinuous improvementdecision making
Certifications
Bachelor degree in Computer ScienceMaster degree in Computer Science