NVIDIA

Senior DGX Cloud Software Engineer – Infrastructure Automation, Distributed Systems

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: Remote • Arizona, Colorado, Illinois, Texas • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $168,000 - $333,500 per year

Job Level

Senior

Tech Stack

CloudDistributed SystemsGoKubernetesLinuxPython

About the role

  • Design, build, and run cloud infrastructure services in scope to meet our business goals performing integrations, migrations, bringups, updates, and decommissions as necessary.
  • Participate in the definition of our internal facing service level objectives and error budgets as part of our overall observability strategy.
  • Eliminate toil or automate it where the ROI of building and maintaining automation is worth it.
  • Practice sustainable blameless incident prevention and incident response while being a member of an on-call rotation.
  • Consult with and provide consultation for peer teams on systems design best practices.
  • Participate in a supportive culture of values-driven introspection, communication, and self-organization

Requirements

  • Proficiency in one or more of the following programming languages: Python or Go
  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
  • 5+ years of relevant experience in infrastructure and fleet management engineering.
  • Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems at scales requiring fully automated management and under active customer consumption in production.
  • A track record demonstrating a mix of initiating your own projects, convincing others to collaborate with you, and collaborating well on projects initiated by others.
  • In-depth knowledge in one or more of the following: Linux, Slurm, Kubernetes, Local and Distributed Storage, and Systems Networking.
Benefits
  • equity
  • benefits 📊 Resume Score Upload your resume to see if it passes auto-rejection tools used by recruiters Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonGoLinuxSlurmKubernetesLocal StorageDistributed StorageSystems NetworkingInfrastructure AutomationFleet Management Engineering
Soft skills
communicationself-organizationcollaborationconsultationincident responseblameless incident preventionvalues-driven introspection