Deep Learning Compiler Engineer – CUDA

NVIDIA

full-time

Posted on: 1/13/2026

Location Type: Office

Location: Shanghai • China

✨ AI Apply

About the role

Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks

Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
2+ years of relevant work experience
Excellent C/C++ programming and software engineering skills, ACM background is a plus
Good fundamental knowledges on computer architecture
Strong ability in abstracting problems and the methodology in resolving problems
Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
Good knowledge of GPU architecture and fast kernel programming skills is a plus
Knowledge of LLM algorithms or a certain HPC domain is a plus
Knowledge of multi-GPU distributed communication is a plus
Excellent oral communication in English is a plus.

Benefits

Tip: use these terms in your resume and cover letter to boost ATS matches.

CC++compiler designMLIRTVMTritonLLVMGPU programmingHPCperformance analysis

problem abstractionproblem resolutionoral communication

MastersPhD