
Technical Program Manager, Cloud Infrastructure
NVIDIA
full-time
Posted on:
Location Type: Office
Location: Santa Clara • California • Washington • United States
Visit company websiteExplore more
Salary
💰 $200,000 - $322,000 per year
Tech Stack
About the role
- As a DGX Cloud Technical Program Manager, you'll be a key partner to our Engineering, Infrastructure, Software teams and their leadership, driving critical programs related to AI capacity enablement and management .
- You'll play a pivotal role in developing and maturing foundational capabilities and processes for DGX Cloud, spanning critical areas such as cluster/capacity bring-up including CPU, storage, networking and compute requirements to support GPUs.
- This is a dynamic, fast-paced environment where TPMs are expected to apply fungible skillsets to a range of high-impact programs across DGX Cloud.
- Collaborating closely with storage engineering and network engineering teams to define and communicate requirements to CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers).
- Drive alignment and a POR for capacity blocks based on workload needs.
- Drive early engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers) to understand their managed storage, network solutions and influence alignment with NVIDIA Cloud roadmap
- Gathering technical requirements, developing comprehensive roadmaps, establishing clear milestones, and ensuring adherence to our Product Lifecycle (PLC) process.
- Managing ongoing capacity operations and the engineering engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Provider) partners, collaborating closely with an SRE lead.
- Focus on availability, maintenance and other critical performance indicators.
- Partner closely within NVIDIA to understand workload requirements, related HW and infra needs, including speeds/feeds to optimize and infrastructure readiness with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers).
- Leveraging Jira and other program management platforms to instill rigor and structure in the management of engineering deliverables.
- Identifying and driving opportunities to onboard the adoption of third-party and in-house cloud infrastructure solutions for deployments, support, security, compliance and observability across DGX Cloud
- Establishing key performance indicators (KPIs) and quantitatively demonstrating the value and impact delivered by your programs.
- Proactively identifying, resolving, and mitigating risks and issues that could affect scope, schedule, and quality across all program aspects.
- Cultivating a culture of continuous improvement, consistently identifying opportunities for process enhancements within our cloud infrastructure operations.
Requirements
- 12+ years of technical program management experience, specifically driving the planning and execution of large-scale cloud infrastructure programs with external partners, with a strong focus on software engineering projects within a matrixed organization.
- Extensive hands-on experience in cloud infrastructure, preferably gained from working at a major Cloud Service Provider (CSP).
- Domain knowledge in the bring-up and end to end operations of compute, storage, networking and GPU (including common failure points at the HW and SW levels).
- Expert-level proficiency with Jira, Smartsheet, or similar program management tools, with the ability to confidently guide engineering teams on their use of the tools.
- Exceptional strategic and tactical thinking abilities, coupled with a strong capacity to build consensus and drive program success
- Comfort and effectiveness in thriving within ambiguous environments.
- Possess excellent communication and technical presentation skills, particularly for executive audiences.
- BS or MS in Electrical Engineering or Computer Science, or equivalent experience.
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
technical program managementcloud infrastructuresoftware engineeringcapacity managementGPU managementnetworkingstorage managementperformance indicatorsrisk managementprocess improvement
Soft Skills
strategic thinkingtactical thinkingconsensus buildingcommunication skillstechnical presentation skillsadaptabilitycollaborationproblem-solvingleadershipcontinuous improvement
Certifications
BS in Electrical EngineeringMS in Computer Science