NVIDIA

Senior AI Training Framework Engineer

NVIDIA

full-time

Posted on:

Location Type: Office

Location: Santa ClaraCaliforniaWashingtonUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,000 - $241,500 per year

Job Level

About the role

  • Design and develop the GenAI open source Megatron Core and NeMo Framework
  • Solve large-scale, end-to-end AI training and inference challenges, spanning the full model lifecycle from initial orchestration, data pre-processing, and running of model training and tuning, to model deployment.
  • Work at the intersection of AI applications, libraries, frameworks, and the entire software stack.
  • Innovate and improve model architectures, distributed training algorithms, and model parallel paradigms.
  • Accelerate foundation model training and finetuning with mixed precision recipes and next-gen NVIDIA GPU architectures.
  • Performance tuning and optimizations of deep learning framework and software components.
  • Research, prototype, and develop robust and scalable AI tools and pipelines.

Requirements

  • MS, PhD or equivalent experience in Computer Science, AI, Applied Math, or related fields and 3+ years of industry experience.
  • Experience with AI Frameworks (e.g. PyTorch, JAX), and/or inference and deployment environments (e.g. TRTLLM, vLLM, SGLang).
  • Proficient in Python programming, software design, debugging, performance analysis, test design and documentation.
  • Consistent record of working effectively across multiple engineering initiatives and improving AI libraries with new innovations.
  • Strong understanding of AI/Deep-Learning fundamentals and their practical applications.
Benefits
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonAI FrameworksPyTorchJAXTRTLLMvLLMSGLangdeep learningmodel trainingperformance tuning
Soft Skills
problem solvinginnovationcollaborationcommunicationdebuggingperformance analysistest designdocumentation
Certifications
MSPhD