
Senior AI Compute Engineer - NVIS
NVIDIA
full-time
Posted on:
Location Type: Hybrid
Location: Santa Clara • California • United States
Visit company websiteExplore more
Salary
💰 $124,000 - $235,750 per year
Job Level
Tech Stack
About the role
- Primary responsibilities will include deploying, managing, and validating AI Compute/HPC infrastructure in Linux-based environments for new and existing customers.
- Be the domain expert with customers during planning calls through implementation.
- Handover-related documentation and perform knowledge transfers required to support customers as they begin rolling out some of the most sophisticated systems in the world!
- Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
Requirements
- 5+ years providing in-depth support and deployment services; solving problems for hardware and software products.
- Knowledge and experience with Linux system administration, process management, package management, task scheduling, kernel management, boot procedures/troubleshooting, performance reporting/optimization/logging, network-routing/advanced networking (tuning and monitoring).
- Cluster management and provisioning technologies for bare-metal servers (bonus credit for BCM (Base Command Manager)).
- Minimum of a four-year degree from an accredited university or college in Computer Science, Electrical or Computer Engineering or equivalent experience.
- Scripting proficiency (Bash, Python, Ansible, etc.).
- Excellent interpersonal skills and the ability to deliver resolutions for customer issues as they arise.
- Strong organizational skills and ability to prioritize/multi-task easily with limited supervision.
- Experience with schedulers such as SLURM, LSF, UGE, etc.
- An ability to travel to customer sites within the United States up to 30% of the time.
- Experience with benchmarking tools such as HPL, NCCL tests, MLPerf as well as Kubernetes experience.
Benefits
- eligible for equity and benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Linux system administrationprocess managementpackage managementtask schedulingkernel managementboot proceduresperformance reportingnetwork routingscripting (Bash, Python, Ansible)cluster management
Soft Skills
interpersonal skillsproblem-solvingorganizational skillsprioritizationmulti-taskingcustomer supportcommunicationknowledge transferfeedback provisiondocumentation
Certifications
Bachelor's degree in Computer ScienceBachelor's degree in Electrical EngineeringBachelor's degree in Computer Engineering