
Senior Deep Learning Framework Engineer
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • Massachusetts • United States
Visit company websiteExplore more
Salary
💰 $152,000 - $241,500 per year
Job Level
About the role
- Integrate new communication libraries features in AI frameworks: from PoC to performance analysis to production
- Perform deep analysis of AI workloads and frameworks to identify multi-GPU communication requirements and opportunities.
- Collaborate hands-on with teams working on the latest AI models.
- Improve AI compilers to hide communications or perform automatic fusion.
- Conduct in-depth AI workload performance characterization on multi-GPU clusters.
- Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads.
- Author custom communication or fused compute-communication kernels to showcase ultimate performance on NV platforms.
- Influence the roadmap of communication libraries - NCCL & NVSHMEM.
- Collaborate with a very dynamic team across multiple time zones.
Requirements
- B.S, M.S. or PHD in Computer Science, or related field (or equivalent experience) with 5+ software engineering and HPC/AI experience
- Development or integration experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang
- Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cuTe)
- Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)
- Experience conducting performance benchmarking on AI clusters.
- Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems)
- Understanding of HPC/AI communication concepts (1-sided v 2-sided communication, elasticity, resiliency, topology discovery, etc)
- Adaptability and passion to learn new areas and tools
- Flexibility to work and communicate effectively across different teams and timezones
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonC++CUDADeep Learning FrameworksPyTorchJAXInference EnginesPerformance BenchmarkingAI CompilersMulti-GPU Communication
Soft Skills
AdaptabilityCollaborationCommunicationFlexibilityPassion for learning