
Deep Learning Compiler Engineer – CUDA
NVIDIA
full-time
Posted on:
Location Type: Office
Location: Shanghai • China
Visit company websiteExplore more
About the role
- Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
- Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
- Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
- Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks
Requirements
- Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
- 2+ years of relevant work experience
- Excellent C/C++ programming and software engineering skills, ACM background is a plus
- Good fundamental knowledges on computer architecture
- Strong ability in abstracting problems and the methodology in resolving problems
- Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
- Good knowledge of GPU architecture and fast kernel programming skills is a plus
- Knowledge of LLM algorithms or a certain HPC domain is a plus
- Knowledge of multi-GPU distributed communication is a plus
- Excellent oral communication in English is a plus.
Benefits
- highly competitive salaries
- comprehensive benefits package
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
CC++compiler designMLIRTVMTritonLLVMGPU programmingHPCperformance analysis
Soft skills
problem abstractionproblem resolutionoral communication
Certifications
MastersPhD