FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Deep Learning Engineer – Autonomous Vehicles
NVIDIASenior Deep Learning Systems Engineer building and scaling training libraries for autonomous driving at NVIDIA. Collaborating with research and platform teams on high-performance distributed systems.
Posted 6/29/2026full-timeSanta Clara • California, Colorado • 🇺🇸 United StatesSenior💰 $224,000 - $356,500 per yearWebsite
Tech Stack
Tools & technologiesDistributed SystemsKubernetesPythonPyTorch
About the role
Key responsibilities & impact- Crafting, scaling, and hardening deep learning infrastructure libraries and frameworks for training on multi-thousand GPU clusters.
- Improving efficiency throughout the training stack: data loaders, distributed training, scheduling, and performance monitoring.
- Building robust training pipelines and libraries to handle massive video datasets and enable rapid experimentation.
- Collaborating with researchers, model engineers, and internal platform teams to enhance efficiency, minimize stalls, and improve training availability.
- Owning core infrastructure components such as orchestration libraries, distributed training frameworks, and fault-resilient training systems.
- Partnering with leadership to ensure infrastructure scales with growing GPU capacity and dataset size while maintaining developer efficiency and stability.
Requirements
What you’ll need- BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent experience.
- 12+ years of professional experience building and scaling high-performance distributed systems, ideally in ML, HPC, or large-scale data infrastructure.
- Extensive knowledge in deep learning frameworks (PyTorch is preferred), large scale training (DDP/FSDP, NCCL, tensor/pipeline parallelism), and performance profiling.
- Strong systems background: datacenter networking (RoCE, IB), parallel filesystems (Lustre), storage systems, schedulers (Slurm, Kubernetes, etc.).
- Proficiency in Python and C++, with experience writing production-grade libraries, orchestration layers, and automation tools.
- Ability to work closely with multi-functional teams (ML researchers, infra engineers, product leads) and translate requirements into robust systems.
Benefits
Comp & perks- Equity
- Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Deep LearningDistributed TrainingPerformance ProfilingProduction-Grade LibrariesOrchestration LayersAutomation ToolsLarge Scale TrainingFault-Resilient Training SystemsData LoadersScheduling
Soft Skills
CollaborationCommunication