
Senior Software Developer, AI Networking
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • Texas • United States
Visit company websiteExplore more
Salary
💰 $152,000 - $241,500 per year
Job Level
About the role
- Characterizing AI workloads and deep learning models aimed at large-scale LLM training and inference on NVIDIA supercomputers
- The role centers on distributed systems with a focus on high-performance networking and NVIDIA communication libraries
- Benchmarking, profiling, and analyzing the performance to find bottlenecks and identify areas for improvement and optimizations, with a strong emphasis on networking aspects
- Developing PyTorch trace-based profiling, analysis, and replaying toolset to aid in benchmarking, debugging, and co-designing network systems for LLM workloads
- Collaborating with multiple teams from hardware to software to provide performance analysis insights
- Defining performance test plans, setting performance expectations for new technologies and solutions, and working to achieve performance targets.
Requirements
- B.Sc in Computer Science or Software Engineering or equivalent experience
- 3+ years of experience with high-performance networking (RDMA, MPI, NCCL, SHARP)
- Demonstrated ability in performance evaluation techniques and approaches
- Experience with NVIDIA GPUs and the CUDA library
- Knowledge of deep learning frameworks like TensorFlow or PyTorch
- Expertise in networking collective communication libraries such as NCCL and protocols like RoCE and RDMA
- Fast and self-learning capabilities with strong analytical and problem-solving skills
- Proficiency in programming languages: Python, Bash, and C++
- Experience with a container-based development environment.
Benefits
- Competitive salaries
- Generous benefits package
- Equity opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
high-performance networkingRDMAMPINCCLSHARPperformance evaluation techniquesNVIDIA GPUsCUDAdeep learning frameworksprogramming languages
Soft Skills
analytical skillsproblem-solving skillsself-learning capabilities