NVIDIA

Senior Systems Software Engineer, Data Center Infrastructure Management – EngOps

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $152,000 - $287,500 per year

Job Level

Tech Stack

About the role

  • Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance.
  • Manage updates to the site controller management nodes.
  • Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions.

Requirements

  • BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience.
  • 5+ years of hands-on experience in deploying and administrating clusters, servers, switches, and related infrastructure.
  • Experience with deployment and configuration of operating systems, computer networks, and high-performance applications.
  • Proven ability to work effectively with developers and test engineers across different teams and time zones.
  • Experience deploying services in Kubernetes.
  • Datacenter or computer architecture experience is required—you should understand server, rack, and network topologies, as well as hardware/firmware/software interactions.
  • Background with hardware management protocols (Redfish, IPMI, BMC) and firmware update automation.
  • Experience configuring and debugging complex data center networks.
  • Experience developing scripts to automate recovery actions for management controllers and datacenter systems.
Benefits
  • Equity
  • Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
cluster managementserver administrationnetwork configurationKubernetesfirmware updatesoperating systems deploymenthigh-performance applicationsscripting for automationhardware management protocolsdata center network debugging
Soft Skills
troubleshootingcollaborationcommunicationproblem-solvingownershipadaptabilityteamworkcross-team coordinationtime managementeffective working across time zones
Certifications
BS in Computer ScienceMS in Computer ScienceBS in Computer EngineeringMS in Computer EngineeringBS in Electrical EngineeringMS in Electrical Engineering