Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior Solutions Architect, AI Cluster Performance, Telemetry

NVIDIA

Senior Solutions Architect specializing in Data Center Systems & Performance at NVIDIA. Analyzing and optimizing performance of world-class AI, deep learning, and HPC ecosystems.

Posted 6/4/2026full-timeSanta Clara • California, Texas • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite

Tech Stack

Tools & technologies
AnsibleCloudDockerGrafanaKubernetesPrometheus

About the role

Key responsibilities & impact
  • Work together with our partners and customers to identify, analyze, and resolve complex performance bottlenecks across interconnected GPU, CPU, and networking systems.
  • Complete and maintain robust performance benchmarking suites to stress-test high-performance clusters and establish performance baselines.
  • Apply industry-standard performance tools to monitor hardware performance counters and extract deep system telemetry.
  • Deeply investigate system and software configurations to find and fix subtle discrepancies that impact peak performance.
  • Partner closely with internal engineering units and outside collaborators and customers to collectively develop solutions and boost infrastructure performance.

Requirements

What you’ll need
  • BS or MS in Engineering, Electrical Engineering, Physics, or Computer Science (or equivalent experience).
  • 8+ years of work-related experience in the high-tech industry, particularly in system build, performance analysis, and technical customer-facing roles.
  • A strong understanding of how CPUs, GPUs, and high-speed networking fabrics interact within massive clusters.
  • Practical experience with performance counters, profiling tools, and telemetry collection systems (e.g., Perf, eBPF, Prometheus, Grafana).
  • Practical experience working with containers, cloud provisioning, and scheduling tools such as Docker, Docker Swarm, Kubernetes, SLURM, Ansible.
  • Proven track record of transforming raw logs and telemetry into structured time series data, dashboards, and heat maps.
  • The ability to translate complex, low-level technical performance anomalies into clear, actionable narratives for cross-functional teams.
  • Strong collaborative skills and a proven history of building successful relationships across diverse engineering and operations teams.

Benefits

Comp & perks
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
performance analysisperformance benchmarkingsystem telemetryperformance countersprofiling toolstelemetry collectioncloud provisioningscheduling toolstransforming raw logsdata visualization
Soft Skills
collaborative skillsrelationship buildingcommunicationproblem-solvinganalytical thinkingtechnical narrative translationcross-functional teamworkcustomer-facing skillsinvestigative skillsattention to detail