Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior Systems Engineer, Storage – DGX Cloud

NVIDIA

Senior Systems Engineer designing and operating large-scale Kubernetes storage platforms. Collaborating with teams to ensure reliability, observability, and performance in production systems.

Posted 6/9/2026full-timeRemote • California, Colorado, Illinois, North Carolina, Oregon • 🇺🇸 United StatesSenior💰 $208,000 - $414,000 per yearWebsite

Tech Stack

Tools & technologies
AnsibleChefGoGrafanaJavaKubernetesLinuxPrometheusPuppetPythonTerraform

About the role

Key responsibilities & impact
  • Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them.
  • Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations.
  • Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable.
  • Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure.
  • Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement.
  • Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity.
  • Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews.
  • Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems.

Requirements

What you’ll need
  • BS degree (or equivalent experience) in Computer Science or related technical field involving coding.
  • 12+ years of practical experience.
  • Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production.
  • Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems.
  • Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack.
  • Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems.
  • Proficiency in one or more of the following: Python, Go, or Java.
  • Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform.

Benefits

Comp & perks
  • Equity
  • Health insurance
  • Retirement plans
  • Paid time off
  • Professional development opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesPythonGoJavainfrastructure-as-codetelemetryobservabilitysoftware design fundamentalsLinuxtroubleshooting
Soft Skills
analytical skillsproblem-solvingcollaborationcommunicationsystematic approach