
Senior Systems Software Engineer, Data Center Infrastructure Management – EngOps
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • Texas • United States
Visit company websiteExplore more
Salary
💰 $152,000 - $287,500 per year
Job Level
Tech Stack
About the role
- Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance.
- Manage updates to the site controller management nodes.
- Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions.
Requirements
- BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience.
- 5+ years of hands-on experience in deploying and administrating clusters, servers, switches, and related infrastructure.
- Experience with deployment and configuration of operating systems, computer networks, and high-performance applications.
- Proven ability to work effectively with developers and test engineers across different teams and time zones.
- Experience deploying services in Kubernetes.
- Datacenter or computer architecture experience is required—you should understand server, rack, and network topologies, as well as hardware/firmware/software interactions.
- Background with hardware management protocols (Redfish, IPMI, BMC) and firmware update automation.
- Experience configuring and debugging complex data center networks.
- Experience developing scripts to automate recovery actions for management controllers and datacenter systems.
Benefits
- Equity
- Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cluster managementserver administrationnetwork configurationKubernetesfirmware updatesoperating systems deploymenthigh-performance applicationsscripting for automationhardware management protocolsdata center network debugging
Soft Skills
troubleshootingcollaborationcommunicationproblem-solvingownershipadaptabilityteamworkcross-team coordinationtime managementeffective working across time zones
Certifications
BS in Computer ScienceMS in Computer ScienceBS in Computer EngineeringMS in Computer EngineeringBS in Electrical EngineeringMS in Electrical Engineering