
Senior Performance Architect – Heterogeneous Workload Optimization
NVIDIA
full-time
Posted on:
Location Type: Hybrid
Location: Santa Clara • California • Massachusetts • United States
Visit company websiteExplore more
Salary
💰 $184,000 - $287,500 per year
Job Level
Tech Stack
About the role
- Architecting and maintaining custom profiling frameworks that provide a unified view of execution across CPU (multi-core/multi-socket) and GPU (multi-node/NVLink) environments.
- Conducting deep-dive benchmarking of EDA applications to characterize memory access patterns, cache hit rates, and instruction-level parallelism.
- Using GPU profilers to detect GPU-side inefficiencies such as warp divergence, sub-optimal occupancy, and PCIe/NVLink bottlenecks.
- Developing tools to monitor and attribute high-watermark memory usage in multi-terabyte EDA builds, finding opportunities for data structure compression or smarter memory pooling.
- Developing predictive models to guide hardware procurement and cloud instance selection based on built gate-count and algorithmic complexity.
Requirements
- A grasp of the CUDA programming model and experience employing GPU profiling tools like NVIDIA Nsight Systems/Compute to address PCIe bottlenecks and kernel stalls.
- Extensive knowledge of profiling tools such as perf, eBPF, VTune, or Valgrind, along with insight into their internal mechanisms.
- A passion for meticulous benchmarking and the ability to distill sophisticated performance data into actionable engineering roadmaps.
- Experience with distributed compute environments (Slurm, LSF, or Kubernetes).
- A BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field (or equivalent experience) with more than 8+yrs of relevant experience and at least 5 years involved in systems-level performance analysis.
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
CUDA programming modelGPU profilingbenchmarkingmemory access patternscache hit ratesinstruction-level parallelismdata structure compressionpredictive modelingperformance analysiscloud instance selection
Soft Skills
meticulous benchmarkingactionable engineering roadmaps