NVIDIA

Senior HPC Solutions Architect

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $184,000 - $356,500 per year

Job Level

About the role

  • Assisting with deployment, debugging, and improving the efficiency of AI workloads on extensive NVIDIA platforms.
  • Identifying hardware issues, supervising them through bugs, and keeping customers updated on current progress.
  • Benchmarking new framework features, analyzing performance, and sharing actionable insights with both customers and internal teams.
  • Working directly with external customers/partners to solve cluster performance and stability issues, identify bottlenecks, and implement effective solutions.
  • Build expertise and guide customers in scaling workloads efficiently and reliably on the latest generation of NVIDIA GPUs.
  • Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 10+ years of experience in designing, managing, and supporting large-scale hybrid networks.
  • Experience with scripting is helpful.
  • Strong programming skills in at least one of the following languages: C, C++, or Python.
  • Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications.
  • Proven understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects.
  • Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g. SLURM or Cloud based clusters).
  • System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
  • Excellent communication and liaison skills to work with customers, partners, and internal functions.
Benefits
  • Equity and benefits
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
CC++PythonCUDAparallel filesystemshigh-speed interconnectsscriptingserver architecturerack-level architecturekernel drivers
Soft Skills
communication skillsliaison skillsproblem-solvingcollaborationcustomer service
Certifications
BS in Electrical EngineeringMS in Computer SciencePhD in Physics