NVIDIA

Senior Manager, Engineering – Data Center Telemetry, RAS

NVIDIA

full-time

Posted on:

Location Type: Hybrid

Location: Santa ClaraCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $272,000 - $431,250 per year

Job Level

About the role

  • Own the end-to-end architecture and delivery for telemetry solutions, including fleet health monitoring, fault remediation, and data visualization at scale
  • Own OOB telemetry solution and data validation for telemetry from each underlying device
  • Recruit, develop, and motivate a high-performing engineering team focused on platform telemetry, RAS and observability
  • Continuously improve software development processes for optimal productivity and quality
  • Work across teams to ensure seamless integration of telemetry solutions with platform firmware, server architecture, and data center management
  • Drive product life cycles with QA teams, ensuring robust testing, productization, and delivery
  • Conduct performance reviews, foster a culture of excellence, and ensure high productivity

Requirements

  • 12+ overall years of relevant experience
  • 5+ years of managing systems/platform software teams
  • BS, MS, or PhD in EE/CS or related field (or equivalent experience)
  • Strong knowledge of DMTF/PLDM for OOB telemetry collection
  • Time series databases (e.g., InfluxDB, Prometheus) and REST APIs (Redfish)
  • Deep understanding of Server and firmware architecture and optimization for low-latency APIs
  • Proven track record of delivering scalable server products and telemetry solutions
  • Experience with SCM (Git, Perforce) and project management tools (Jira)
  • Hands-on experience with x86/ARM system architecture and coding (C/C++, Python)
  • Familiarity with Confidential Compute and notification systems
  • Demonstrated ability to analyze algorithms for time/space complexity and system resource requirements
Benefits
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
DMTFPLDMtime series databasesInfluxDBPrometheusREST APIsCC++Pythonx86/ARM system architecture
Soft Skills
team managementleadershipmotivationprocess improvementcollaborationperformance reviewsculture of excellenceproductivity
Certifications
BS in EEMS in EEPhD in EEBS in CSMS in CSPhD in CS