NVIDIA

Senior Performance Architect – Heterogeneous Workload Optimization

NVIDIA

full-time

Posted on:

Location Type: Hybrid

Location: Santa ClaraCaliforniaMassachusettsUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $184,000 - $287,500 per year

Job Level

About the role

  • Architecting and maintaining custom profiling frameworks that provide a unified view of execution across CPU (multi-core/multi-socket) and GPU (multi-node/NVLink) environments.
  • Conducting deep-dive benchmarking of EDA applications to characterize memory access patterns, cache hit rates, and instruction-level parallelism.
  • Using GPU profilers to detect GPU-side inefficiencies such as warp divergence, sub-optimal occupancy, and PCIe/NVLink bottlenecks.
  • Developing tools to monitor and attribute high-watermark memory usage in multi-terabyte EDA builds, finding opportunities for data structure compression or smarter memory pooling.
  • Developing predictive models to guide hardware procurement and cloud instance selection based on built gate-count and algorithmic complexity.

Requirements

  • A grasp of the CUDA programming model and experience employing GPU profiling tools like NVIDIA Nsight Systems/Compute to address PCIe bottlenecks and kernel stalls.
  • Extensive knowledge of profiling tools such as perf, eBPF, VTune, or Valgrind, along with insight into their internal mechanisms.
  • A passion for meticulous benchmarking and the ability to distill sophisticated performance data into actionable engineering roadmaps.
  • Experience with distributed compute environments (Slurm, LSF, or Kubernetes).
  • A BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field (or equivalent experience) with more than 8+yrs of relevant experience and at least 5 years involved in systems-level performance analysis.
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
CUDA programming modelGPU profilingbenchmarkingmemory access patternscache hit ratesinstruction-level parallelismdata structure compressionpredictive modelingperformance analysiscloud instance selection
Soft Skills
meticulous benchmarkingactionable engineering roadmaps