NVIDIA

Senior System Software Engineer, NCCL – Partner Enablement

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,000 - $218,500 per year

Job Level

About the role

  • Engage with our partners and customers to root cause functional and performance issues reported with NCCL
  • Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters
  • Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
  • Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters
  • Document and conduct trainings/webinars for NCCL
  • Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.

Requirements

  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience.
  • Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
  • Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
  • Expert in Linux fundamentals and a scripting language, preferably Python
  • Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
CC++parallel programmingMPINCCLUCXNVSHMEMperformance analysisdebuggingprofiling
Soft skills
adaptabilityflexibilitycommunicationtrainingcollaboration