Principal Deep Learning Communication Architect

NVIDIA

full-time

Posted on: 4/14/2026

Location Type: Remote

✨ AI Apply

💰 $272,000 - $431,250 per year

About the role

Define the long-term technical roadmap for communication libraries across NVIDIA’s next-generation platforms
Lead the development of next-generation communication primitives and collective algorithms
Partner with application developers to architect and implement specialized communication primitives
Collaborate with silicon architects and software engineers to influence hardware specifications for next-generation networking
Develop high-fidelity analytical models and simulators to predict system behavior under emerging workloads

Ph.D. or M.S. in Computer Science, Electrical Engineering, or a related field (or equivalent experience)
12+ years of industry experience in high-performance computing (HPC) or distributed deep learning
Deep understanding of 3D parallelism (Data, Tensor, Pipeline) and advanced strategies including Context Parallelism, Expert Parallelism, and Zero Redundancy Optimizer (ZeRO) variants
Deep technical proficiency with NCCL, UCX, UCC, NVSHMEM, or MPI
Experience with RDMA, RoCE, and low-level InfiniBand verbs
Advanced knowledge of high-throughput inference engines and schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo
Expert knowledge of the NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache)

Benefits

equity
benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

high-performance computingdistributed deep learning3D parallelismContext ParallelismExpert ParallelismZero Redundancy OptimizerNCCLUCXUCCNVSHMEM

Soft Skills

leadershipcollaborationcommunication