NVIDIA

Senior Software Developer, AI Networking

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,000 - $241,500 per year

Job Level

About the role

  • Characterizing AI workloads and deep learning models aimed at large-scale LLM training and inference on NVIDIA supercomputers
  • The role centers on distributed systems with a focus on high-performance networking and NVIDIA communication libraries
  • Benchmarking, profiling, and analyzing the performance to find bottlenecks and identify areas for improvement and optimizations, with a strong emphasis on networking aspects
  • Developing PyTorch trace-based profiling, analysis, and replaying toolset to aid in benchmarking, debugging, and co-designing network systems for LLM workloads
  • Collaborating with multiple teams from hardware to software to provide performance analysis insights
  • Defining performance test plans, setting performance expectations for new technologies and solutions, and working to achieve performance targets.

Requirements

  • B.Sc in Computer Science or Software Engineering or equivalent experience
  • 3+ years of experience with high-performance networking (RDMA, MPI, NCCL, SHARP)
  • Demonstrated ability in performance evaluation techniques and approaches
  • Experience with NVIDIA GPUs and the CUDA library
  • Knowledge of deep learning frameworks like TensorFlow or PyTorch
  • Expertise in networking collective communication libraries such as NCCL and protocols like RoCE and RDMA
  • Fast and self-learning capabilities with strong analytical and problem-solving skills
  • Proficiency in programming languages: Python, Bash, and C++
  • Experience with a container-based development environment.
Benefits
  • Competitive salaries
  • Generous benefits package
  • Equity opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
high-performance networkingRDMAMPINCCLSHARPperformance evaluation techniquesNVIDIA GPUsCUDAdeep learning frameworksprogramming languages
Soft Skills
analytical skillsproblem-solving skillsself-learning capabilities