NVIDIA

Deep Learning Compiler Engineer – CUDA

NVIDIA

full-time

Posted on:

Location Type: Office

Location: ShanghaiChina

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
  • Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
  • Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
  • Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks

Requirements

  • Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
  • 2+ years of relevant work experience
  • Excellent C/C++ programming and software engineering skills, ACM background is a plus
  • Good fundamental knowledges on computer architecture
  • Strong ability in abstracting problems and the methodology in resolving problems
  • Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
  • Good knowledge of GPU architecture and fast kernel programming skills is a plus
  • Knowledge of LLM algorithms or a certain HPC domain is a plus
  • Knowledge of multi-GPU distributed communication is a plus
  • Excellent oral communication in English is a plus.
Benefits
  • highly competitive salaries
  • comprehensive benefits package

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
CC++compiler designMLIRTVMTritonLLVMGPU programmingHPCperformance analysis
Soft skills
problem abstractionproblem resolutionoral communication
Certifications
MastersPhD