
Senior HPC Solutions Architect
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • United States
Visit company websiteExplore more
Salary
💰 $184,000 - $356,500 per year
Job Level
About the role
- Assisting with deployment, debugging, and improving the efficiency of AI workloads on extensive NVIDIA platforms.
- Identifying hardware issues, supervising them through bugs, and keeping customers updated on current progress.
- Benchmarking new framework features, analyzing performance, and sharing actionable insights with both customers and internal teams.
- Working directly with external customers/partners to solve cluster performance and stability issues, identify bottlenecks, and implement effective solutions.
- Build expertise and guide customers in scaling workloads efficiently and reliably on the latest generation of NVIDIA GPUs.
- Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.
Requirements
- BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
- 10+ years of experience in designing, managing, and supporting large-scale hybrid networks.
- Experience with scripting is helpful.
- Strong programming skills in at least one of the following languages: C, C++, or Python.
- Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications.
- Proven understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects.
- Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g. SLURM or Cloud based clusters).
- System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
- Excellent communication and liaison skills to work with customers, partners, and internal functions.
Benefits
- Equity and benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
CC++PythonCUDAparallel filesystemshigh-speed interconnectsscriptingserver architecturerack-level architecturekernel drivers
Soft Skills
communication skillsliaison skillsproblem-solvingcollaborationcustomer service
Certifications
BS in Electrical EngineeringMS in Computer SciencePhD in Physics