Salary
💰 $120,000 - $235,750 per year
About the role
- Lead planning and deployments of AI data centers including power/cooling systems, cabling and network provisioning and bring-up/validation.
- Focus on data center audit, planning and deployment ensuring the integrity of NVIDIA platform infrastructure.
- Collaborate with other teams to plan and implement data center infrastructure solutions based on NVIDIA Datacenter reference architecture, including power distribution, cooling systems, network architecture, server hardware, and storage systems.
- Plan and manage deployment of NVIDIA's pioneering AI infrastructure solutions including highly complex rack-scale, liquid cooled compute and networking hardware systems.
- Conduct pre-deployment planning including reviewing cluster and data center architecture, plan network port mapping and fiber optic cabling BOM, identify potential risks, train vendors and find areas for improvement.
- Evaluate customers' and partners' infrastructure design proposals for consistency with industry standards and regulatory requirements; provide feedback and recommendations to improve performance, scalability, and cost-effectiveness.
- Perform testing, troubleshooting and validation of compute systems based on collaboration with product and engineering teams.
- Act as the NVIS mentor providing guidance, mentorship, and support to ensure the NVIS team's success in their respective roles.
- Establish and enforce quality assurance processes to verify that deployments meet established specifications and performance benchmarks.
- Conduct thorough bring-up, testing, and validation to validate the functionality and reliability of infrastructure components.
- Drive continuous improvement initiatives to enhance data center infrastructure efficiency for NVIDIA data center reference architecture and deployment blueprint, resilience, and sustainability.
- Collaborate and communicate across internal teams, external vendors, and customers to facilitate the seamless integration of data center infrastructure solutions; serve as a domain expert and point of contact for infrastructure-related inquiries and blocking issues.
Requirements
- Bachelor's degree (or equivalent experience) in Engineering, Computer Science, Information Technology, or a related field.
- Minimum 3+ years of overall experience in enterprise and/or hyperscale data centers with continual infrastructure deployment experience, preferably for high density AI/HPC data centers.
- Working experience in data center operations, or infrastructure management roles, focusing on large-scale data center deployments.
- Strong technical knowledge and experience in the data center stack - power distribution, liquid cooling, servers, networking, storage and pre-deployment planning
- Relevant certification – preferred
- Demonstrated technical and project leadership under fluid situations, ability to adapt to unknowns and change.
- Excellent analytical, problem-solving, and decision-making skills, keen attention to detail, and a commitment to quality.
- Excellent communication and interpersonal abilities, capable of engaging with various collaborators like customers to enable productive discussions.
- Organization & Time Management – able to plan, schedule, and organize tasks related to the job to achieve goals within or ahead of established time frames.
- Willingness to travel (up to 40%).
- Linux system administration skills
- Strong knowledge of whole data center Infrastructure stack
- Flexible/agile and enjoys solving challenging problems