NVIDIA

Technical Program Manager, Cloud Infrastructure

NVIDIA

full-time

Posted on:

Location Type: Office

Location: Santa ClaraCaliforniaWashingtonUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $200,000 - $322,000 per year

Job Level

Tech Stack

About the role

  • As a DGX Cloud Technical Program Manager, you'll be a key partner to our Engineering, Infrastructure, Software teams and their leadership, driving critical programs related to AI capacity enablement and management .
  • You'll play a pivotal role in developing and maturing foundational capabilities and processes for DGX Cloud, spanning critical areas such as cluster/capacity bring-up including CPU, storage, networking and compute requirements to support GPUs.
  • This is a dynamic, fast-paced environment where TPMs are expected to apply fungible skillsets to a range of high-impact programs across DGX Cloud.
  • Collaborating closely with storage engineering and network engineering teams to define and communicate requirements to CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers).
  • Drive alignment and a POR for capacity blocks based on workload needs.
  • Drive early engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers) to understand their managed storage, network solutions and influence alignment with NVIDIA Cloud roadmap
  • Gathering technical requirements, developing comprehensive roadmaps, establishing clear milestones, and ensuring adherence to our Product Lifecycle (PLC) process.
  • Managing ongoing capacity operations and the engineering engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Provider) partners, collaborating closely with an SRE lead.
  • Focus on availability, maintenance and other critical performance indicators.
  • Partner closely within NVIDIA to understand workload requirements, related HW and infra needs, including speeds/feeds to optimize and infrastructure readiness with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers).
  • Leveraging Jira and other program management platforms to instill rigor and structure in the management of engineering deliverables.
  • Identifying and driving opportunities to onboard the adoption of third-party and in-house cloud infrastructure solutions for deployments, support, security, compliance and observability across DGX Cloud
  • Establishing key performance indicators (KPIs) and quantitatively demonstrating the value and impact delivered by your programs.
  • Proactively identifying, resolving, and mitigating risks and issues that could affect scope, schedule, and quality across all program aspects.
  • Cultivating a culture of continuous improvement, consistently identifying opportunities for process enhancements within our cloud infrastructure operations.

Requirements

  • 12+ years of technical program management experience, specifically driving the planning and execution of large-scale cloud infrastructure programs with external partners, with a strong focus on software engineering projects within a matrixed organization.
  • Extensive hands-on experience in cloud infrastructure, preferably gained from working at a major Cloud Service Provider (CSP).
  • Domain knowledge in the bring-up and end to end operations of compute, storage, networking and GPU (including common failure points at the HW and SW levels).
  • Expert-level proficiency with Jira, Smartsheet, or similar program management tools, with the ability to confidently guide engineering teams on their use of the tools.
  • Exceptional strategic and tactical thinking abilities, coupled with a strong capacity to build consensus and drive program success
  • Comfort and effectiveness in thriving within ambiguous environments.
  • Possess excellent communication and technical presentation skills, particularly for executive audiences.
  • BS or MS in Electrical Engineering or Computer Science, or equivalent experience.
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
technical program managementcloud infrastructuresoftware engineeringcapacity managementGPU managementnetworkingstorage managementperformance indicatorsrisk managementprocess improvement
Soft Skills
strategic thinkingtactical thinkingconsensus buildingcommunication skillstechnical presentation skillsadaptabilitycollaborationproblem-solvingleadershipcontinuous improvement
Certifications
BS in Electrical EngineeringMS in Computer Science