
Principal Deep Learning Communication Architect
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • Texas • United States
Visit company websiteExplore more
Salary
💰 $272,000 - $431,250 per year
Job Level
About the role
- Define the long-term technical roadmap for communication libraries across NVIDIA’s next-generation platforms
- Lead the development of next-generation communication primitives and collective algorithms
- Partner with application developers to architect and implement specialized communication primitives
- Collaborate with silicon architects and software engineers to influence hardware specifications for next-generation networking
- Develop high-fidelity analytical models and simulators to predict system behavior under emerging workloads
Requirements
- Ph.D. or M.S. in Computer Science, Electrical Engineering, or a related field (or equivalent experience)
- 12+ years of industry experience in high-performance computing (HPC) or distributed deep learning
- Deep understanding of 3D parallelism (Data, Tensor, Pipeline) and advanced strategies including Context Parallelism, Expert Parallelism, and Zero Redundancy Optimizer (ZeRO) variants
- Deep technical proficiency with NCCL, UCX, UCC, NVSHMEM, or MPI
- Experience with RDMA, RoCE, and low-level InfiniBand verbs
- Advanced knowledge of high-throughput inference engines and schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo
- Expert knowledge of the NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache)
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
high-performance computingdistributed deep learning3D parallelismContext ParallelismExpert ParallelismZero Redundancy OptimizerNCCLUCXUCCNVSHMEM
Soft Skills
leadershipcollaborationcommunication