Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior Product Manager, AI Factory Infra

NVIDIA

Product Manager guiding automation strategies and improving operator experience in AI factory infrastructure. Managing break-fix automation across multiple vendors and technologies.

Posted 5/28/2026full-timeSanta Clara • California, New York, Washington • 🇺🇸 United StatesSenior💰 $208,000 - $379,500 per yearWebsite

Tech Stack

Tools & technologies
Distributed Systems

About the role

Key responsibilities & impact
  • Take full responsibility for the strategic direction and roadmap of the break-fix automation system spanning multiple vendors, technologies, and CSPs.
  • Define automation confidence thresholds, blocking issue criteria, and human-in-the-loop intervention points that balance speed with operational safety.
  • Build the operator UX for repair queues, workflow transparency, and audit trails — ensuring on-call engineers have the context they need to act quickly and confidently.
  • Drive the integration between failure attribution and automated repair actions, following through from detection to resolution.
  • Define repair SLOs and own the metrics framework for time-to-drain, time-to-healthy, and overall fleet availability.
  • Collaborate with NCP operators, SRE teams, and hardware vendor partners to integrate RMA processes and optimize repair workflows at scale.

Requirements

What you’ll need
  • 12+ years of product management experience in infrastructure, platform, or MLOps areas, or equivalent background.
  • BS or MS in Computer Science, Engineering, or a related technical area, or equivalent experience.
  • Demonstrated expertise with distributed systems, workflow orchestration, and the safety tradeoffs inherent in automation.
  • Track record owning products with real-world operational consequences — you understand blast radius and build accordingly.
  • Strong operator UX instincts — proven ability to translate complex system state into workflows that on-call engineers can act on under pressure.
  • Ability to build alignment across engineering, SRE, and external vendor partner teams.

Benefits

Comp & perks
  • Competitive salaries
  • Comprehensive benefits package
  • Eligibility for equity

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
automationworkflow orchestrationdistributed systemsmetrics frameworkrepair SLOsfailure attributionbreak-fix automationhuman-in-the-looptime-to-draintime-to-healthy
Soft Skills
strategic directionoperator UX instinctscollaborationalignment buildingcontextual understandingpressure handlingtranslating complex systemsdecision makingproblem solvingcommunication
Certifications
BS in Computer ScienceMS in Computer ScienceEngineering degree