NVIDIA

AI Software Engineer, LLM Inference Performance Analysis

NVIDIA

full-time

Posted on:

Location Type: Office

Location: Santa ClaraCaliforniaNew YorkUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $124,000 - $218,500 per year

Tech Stack

About the role

  • Analyze the performance of LLMs running on NVIDIA Compute Platforms using profiling, benchmarking, and performance analysis tools.
  • Understand and find opportunities for compiler optimization pipelines, including IR-based compiler middle-end optimizations and kernel-level transformations.
  • Design and develop new compiler passes and optimizations techniques to deliver best-in-class, robust, and maintainable compiler infrastructure and tools.
  • Collaborate with hardware architecture, compiler, and kernel teams to understand how firmware and circuitry co-design enables efficient LLM inference.
  • Work with globally distributed teams across compiler, kernel, hardware, and framework domains to investigate performance issues and contribute to solutions.

Requirements

  • Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
  • Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
  • Foundational understanding of modern deep learning models (including transformers and LLMs) and interest in inference performance and optimization.
  • Exposure to compiler concepts such as intermediate representations (IR), graph transformations, scheduling, or code generation through coursework, research, internships, or projects.
  • Familiarity with at least one deep learning framework or compiler/runtime ecosystem (e.g., TensorRT-LLM, PyTorch, JAX/XLA, Triton, vLLM, or similar).
  • Ability to analyze performance bottlenecks and reason about optimization opportunities across model execution, kernels, and runtime systems.
  • Experience working on class projects, internships, research, or open-source contributions involving performance-critical systems, compilers, or ML infrastructure.
  • Strong communication skills and the ability to collaborate effectively in a fast-paced, team-oriented environment.
Benefits
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
C++Pythoncompiler optimizationintermediate representations (IR)graph transformationsschedulingcode generationdeep learning modelsperformance analysisperformance benchmarking
Soft skills
strong communicationcollaborationteam-orientedproblem-solvinganalytical thinking