FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Product Manager, AI Factory Infra
NVIDIAProduct Manager guiding automation strategies and improving operator experience in AI factory infrastructure. Managing break-fix automation across multiple vendors and technologies.
Posted 5/28/2026full-timeSanta Clara • California, New York, Washington • 🇺🇸 United StatesSenior💰 $208,000 - $379,500 per yearWebsite
Tech Stack
Tools & technologiesDistributed Systems
About the role
Key responsibilities & impact- Take full responsibility for the strategic direction and roadmap of the break-fix automation system spanning multiple vendors, technologies, and CSPs.
- Define automation confidence thresholds, blocking issue criteria, and human-in-the-loop intervention points that balance speed with operational safety.
- Build the operator UX for repair queues, workflow transparency, and audit trails — ensuring on-call engineers have the context they need to act quickly and confidently.
- Drive the integration between failure attribution and automated repair actions, following through from detection to resolution.
- Define repair SLOs and own the metrics framework for time-to-drain, time-to-healthy, and overall fleet availability.
- Collaborate with NCP operators, SRE teams, and hardware vendor partners to integrate RMA processes and optimize repair workflows at scale.
Requirements
What you’ll need- 12+ years of product management experience in infrastructure, platform, or MLOps areas, or equivalent background.
- BS or MS in Computer Science, Engineering, or a related technical area, or equivalent experience.
- Demonstrated expertise with distributed systems, workflow orchestration, and the safety tradeoffs inherent in automation.
- Track record owning products with real-world operational consequences — you understand blast radius and build accordingly.
- Strong operator UX instincts — proven ability to translate complex system state into workflows that on-call engineers can act on under pressure.
- Ability to build alignment across engineering, SRE, and external vendor partner teams.
Benefits
Comp & perks- Competitive salaries
- Comprehensive benefits package
- Eligibility for equity
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
automationworkflow orchestrationdistributed systemsmetrics frameworkrepair SLOsfailure attributionbreak-fix automationhuman-in-the-looptime-to-draintime-to-healthy
Soft Skills
strategic directionoperator UX instinctscollaborationalignment buildingcontextual understandingpressure handlingtranslating complex systemsdecision makingproblem solvingcommunication
Certifications
BS in Computer ScienceMS in Computer ScienceEngineering degree