NVIDIA

Senior Deep Learning Framework Engineer

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaMassachusettsUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,000 - $241,500 per year

Job Level

Tech Stack

About the role

  • Integrate new communication libraries features in AI frameworks: from PoC to performance analysis to production
  • Perform deep analysis of AI workloads and frameworks to identify multi-GPU communication requirements and opportunities.
  • Collaborate hands-on with teams working on the latest AI models.
  • Improve AI compilers to hide communications or perform automatic fusion.
  • Conduct in-depth AI workload performance characterization on multi-GPU clusters.
  • Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads.
  • Author custom communication or fused compute-communication kernels to showcase ultimate performance on NV platforms.
  • Influence the roadmap of communication libraries - NCCL & NVSHMEM.
  • Collaborate with a very dynamic team across multiple time zones.

Requirements

  • B.S, M.S. or PHD in Computer Science, or related field (or equivalent experience) with 5+ software engineering and HPC/AI experience
  • Development or integration experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang
  • Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cuTe)
  • Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)
  • Experience conducting performance benchmarking on AI clusters.
  • Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems)
  • Understanding of HPC/AI communication concepts (1-sided v 2-sided communication, elasticity, resiliency, topology discovery, etc)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonC++CUDADeep Learning FrameworksPyTorchJAXInference EnginesPerformance BenchmarkingAI CompilersMulti-GPU Communication
Soft Skills
AdaptabilityCollaborationCommunicationFlexibilityPassion for learning