NVIDIA

Capacity Operations Manager

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $136,000 - $218,500 per year

Job Level

About the role

  • Coordinate the development of High Performance Computing (HPC) clusters, collaborating closely with internal and external engineering teams.
  • Direct and improve GPU capacity and additional compute resources across diverse cloud service platforms to satisfy rising needs and secure efficient deployment.
  • Design, improve, and manage data models, reporting platforms, data automation solutions, dashboards, and performance measures that back NVIDIA Infrastructure governance programs and strategic capacity decisions.
  • Assess the technical and business requirements for GPU capacity and other compute resources from different internal and external groups.
  • Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with relevant infrastructure teams to resolve them.
  • Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams.
  • Develop and enhance tooling for our cloud infrastructure and analytics platform to optimize resource usage and performance for NVIDIA and its customers.
  • This includes crafting and developing tools for automating workflows and potentially bringing to bear AI techniques to extract useful signals and insights from generated data.
  • Partner and cross-collaborate with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop Infrastructure and Service Level benchmarks to match Customer satisfaction.

Requirements

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field, or equivalent experience.
  • 8+ years of overall experience in cloud computing, specifically in managing or using GPU capacity for high performance computing.
  • A proven record of large-scale computing operations and planning is a plus.
  • Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets.
  • Experience with command line interfaces and shell scripting languages.
  • Comprehensive knowledge of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies.
  • Practical experience with Cloud Service Providers including AWS, Azure, GCP, and OCI is essential.
  • Demonstrated experience in bringing to bear AI tools and techniques to extract useful signals and insights from data, specifically to improve resource usage and automation.
  • Deep knowledge and active use of statistical modeling and machine learning approaches for boosting operational efficiency and supporting strategic capacity decisions.
  • Understanding of analytics, statistical modeling, and machine learning methodologies.
  • Strong communication and relationship-building skills, with the ability to work well across different departments and contribute to strategic decisions.
  • Self-starter, self-motivated, focused, and self-sufficient, with a willingness to learn new challenges and adapt quickly in a dynamic environment.
  • Ability to operate effectively amidst uncertainty and rapidly changing business conditions, with an agile approach and a commitment to ongoing improvement.
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
High Performance Computing (HPC)GPU capacity managementcloud architecturedata modelingshell scriptingcloud service modelsstatistical modelingmachine learningdata automationanalytics
Soft Skills
communicationrelationship-buildingself-starterself-motivatedadaptabilityproblem-solvingcollaborationstrategic thinkingfocuscommitment to improvement
Certifications
Bachelor's degree in Computer ScienceMaster's degree in Software Engineering