FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Systems Engineer, Storage – DGX Cloud
NVIDIASenior Systems Engineer designing and operating large-scale Kubernetes storage platforms. Collaborating with teams to ensure reliability, observability, and performance in production systems.
Posted 6/9/2026full-timeRemote • California, Colorado, Illinois, North Carolina, Oregon • 🇺🇸 United StatesSenior💰 $208,000 - $414,000 per yearWebsite
Tech Stack
Tools & technologiesAnsibleChefGoGrafanaJavaKubernetesLinuxPrometheusPuppetPythonTerraform
About the role
Key responsibilities & impact- Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them.
- Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations.
- Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable.
- Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure.
- Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement.
- Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity.
- Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews.
- Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems.
Requirements
What you’ll need- BS degree (or equivalent experience) in Computer Science or related technical field involving coding.
- 12+ years of practical experience.
- Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production.
- Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems.
- Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack.
- Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems.
- Proficiency in one or more of the following: Python, Go, or Java.
- Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform.
Benefits
Comp & perks- Equity
- Health insurance
- Retirement plans
- Paid time off
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesPythonGoJavainfrastructure-as-codetelemetryobservabilitysoftware design fundamentalsLinuxtroubleshooting
Soft Skills
analytical skillsproblem-solvingcollaborationcommunicationsystematic approach