NVIDIA

Senior AI and ML Storage Engineer

NVIDIA

full-time

Posted on:

Location Type: Hybrid

Location: Santa Clara • California, Washington • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $184,000 - $356,500 per year

Job Level

Senior

Tech Stack

AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformKubernetesPython

About the role

  • Design, develop, and operate distributed systems that manage data, compute, and networking for large-scale AI workloads.
  • Build software and automation to orchestrate workloads across thousands of GPUs and petabytes of storage in multi-region clusters.
  • Collaborate with AI/ML research teams to understand their requirements and translate them into scalable, high-performance solutions.
  • Drive improvements in system reliability, performance, and observability to meet exascale standards.
  • Partner with security, networking, and platform teams to ensure that MARS infrastructure meets the highest standards of robustness and compliance.
  • Participate in design reviews, contribute to system architecture discussions, and influence the evolution of NVIDIA’s AI infrastructure stack.
  • Stay current with advances in distributed systems, large-scale computing, and AI frameworks to help shape the future direction of MARS.

Requirements

  • BS or equivalent experience in Computer Science, Computer Engineering, or a related technical field.
  • 8+ years of experience developing and operating large-scale distributed systems, infrastructure platforms, or HPC environments.
  • Strong programming skills in C++, Python, or Go, with proven experience designing production-quality software systems.
  • Solid understanding of distributed systems principles, data management, and large-scale orchestration frameworks.
  • Hands-on experience with high-performance storage (e.g., Lustre, GPFS, BeeGFS) and compute scheduling and orchestration (e.g., Slurm, Kubernetes, LSF).
  • Familiarity with cloud environments (Azure, AWS, GCP) and infrastructure automation tools.
  • Strong problem-solving skills, ownership mindset, and the ability to thrive in a fast-paced, collaborative environment.
  • Excellent communication skills and a track record of cross-functional collaboration.
Benefits
  • Equity
  • Benefits 📊 Resume Score Upload your resume to see if it passes auto-rejection tools used by recruiters Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
C++PythonGodistributed systemsdata managementlarge-scale orchestration frameworkshigh-performance storagecompute schedulinginfrastructure automationproduction-quality software systems
Soft skills
problem-solvingownership mindsetcollaborationcommunication
EEOC

Software Engineer – Java

EEOC
Mid · SeniorcontractCalifornia, New York · 🇺🇸 United States
Posted: 1 hour agoSource: earlywarning.wd5.myworkdayjobs.com
JavaSpringSpring BootSpringBootSQL
GEICO

Senior Software Engineer – GoLang, Kubernetes, Object Storage

GEICO
Seniorfull-time$80k–$215k / yearWashington · 🇺🇸 United States
Posted: 5 hours agoSource: geico.wd1.myworkdayjobs.com
AnsibleAWSAzureDockerGoJavaKubernetesNoSQLSDLCTerraform
PIMCO

Senior Java Developer

PIMCO
Seniorfull-time$175k–$240k / yearCalifornia, New York · 🇺🇸 United States
Posted: 2 days agoSource: pimco.wd1.myworkdayjobs.com
AWSCloudJavaKafkaNoSQLPythonSQL
eBay

Staff Backend Engineer – MTS 2

eBay
Leadfull-time$190k–$254k / yearCalifornia, Washington · 🇺🇸 United States
Posted: 2 days agoSource: ebay.wd5.myworkdayjobs.com
AndroidDistributed SystemsGraphQLiOSJavaJavaScriptNode.js