
Distinguished Engineer – GPU Fleet Operations Automation
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: Remote • California, Colorado, Illinois, New York, Texas • 🇺🇸 United States
Visit company websiteSalary
💰 $308,000 - $471,500 per year
Job Level
SeniorLead
Tech Stack
CloudKubernetes
About the role
- Various Architectural Work: define and drive the technical implementation for DGX Cloud operations practice for GPU fleet lifecycle.
- Collaborate on Cross Domain Disciplines: drive the technical strategy and awareness for best practices and technical capabilities into DGX Cloud engineering practices.
- Accelerate Integration: Guide the technical delivery into DGX Cloud across all delivery environments: enterprise, public cloud, and high security, isolated, sovereign.
- Engage Stakeholders: Collaborate with customers, infrastructure providers, and partners to ensure NVIDIA’s solutions set the industry standard for operational excellence.
- Full Software and System Lifecycle: From ideation to architecture, design, development, deployment, operations, and full lifecycle management, lead all technical aspects of planning and continuous evolution of large technical scope.
Requirements
- 15-18+ overall years in technical roles with a focus on operations and automation for cloud infrastructure, platforms, and applications.
- 5-10+ years of lead experience
- BS/MS or higher or equivalent experience in systems / software engineering, or related engineering fields
- Technical proficiency in multi-tenant data center and cloud-native architectures, with bare metal, virtualization, containerization, and higher level abstractions (IaaS, Kubernetes, Slurm), AI/ML platforms and applications.
- Shown success delivering high-impact technically complex solutions that achieve high levels of transparency into resource utilization, performance, and operational insights.
- Technical Leadership: Ability to synthesize multi-functional needs into architecture and design while guiding internal execution across complementary teams.
- Communication and Partnership: Strong collaboration and influence skills, capable of leading engineering engagement, presenting with peers, partners, and working with high performance accelerated computing customers.
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
cloud infrastructureautomationmulti-tenant data center architecturecloud-native architecturebare metalvirtualizationcontainerizationIaaSKubernetesAI/ML platforms
Soft skills
technical leadershipcollaborationinfluencecommunication
Certifications
BS in systems engineeringMS in software engineering