
Deep Learning Performance Architect – Perf Tools
NVIDIA
full-time
Posted on:
Location Type: Office
Location: Beijing • 🇨🇳 China
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
Python
About the role
- Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle
- Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities
- AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure
- Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
Requirements
- BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)
- 4+ years of software development
- Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program
- Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals
- Experience with performance modeling, architecture simulation, profiling, and analysis
- Self-starter who thrives in dynamic environments and manages competing priorities effectively.
Benefits
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
C++Pythonperformance modelingarchitecture simulationprofilingdebugginglow-level programmingGPU performance analysisAI/ML-driven toolsautomated workflows
Soft skills
self-starterdynamic environment adaptabilitycompeting priorities managementanalytical skillscollaboration
Certifications
BS in Computer ScienceBS in Electronic Engineering