NVIDIA

Senior Software Engineer - Distributed Inference

NVIDIA

full-time

Posted on:

Origin:  • 🇺🇸 United States • Arizona, Colorado

Visit company website
AI Apply
Manual Apply

Salary

💰 $184,000 - $356,500 per year

Job Level

Senior

Tech Stack

AWSAzureCloudDistributed SystemsGoogle Cloud PlatformKubernetesMicroservicesPythonRust

About the role

  • NVIDIA seeking Senior System Software Engineer for user-facing tools for Dynamo Inference Server
  • Build and maintain distributed model management systems, including Rust-based runtime components for large-scale AI inference workloads
  • Implement inference scheduling and deployment solutions on Kubernetes and Slurm; drive advances in scaling, orchestration, and resource management
  • Collaborate with infrastructure engineers and researchers to develop scalable APIs, services, and end-to-end inference workflows
  • Create monitoring, benchmarking, automation, and documentation processes to ensure low-latency, robust, production-ready inference systems on GPU clusters
  • Work in a remote-friendly, fast-paced team focused on GPU-accelerated deep learning software

Requirements

  • Bachelor’s, Master’s, or PhD in Computer Science, ECE, or related field (or equivalent experience)
  • 6+ years of professional systems software development experience
  • Strong programming expertise in Rust (C++ and Python are a plus)
  • Deep knowledge of distributed systems, runtime orchestration, and cluster-scale services
  • Hands-on experience with Kubernetes, container-based microservices, and integration with Slurm
  • Proven ability to excel in fast-paced R&D environments and collaborate across functions
  • (Nice-to-have) Experience with Dynamo Inference Server, TensorRT, ONNX Runtime, and LLM inference pipelines at scale
  • (Nice-to-have) Contributions to large-scale, low-latency distributed systems and GPU inference performance tuning (CUDA, cloud-native/hybrid environments)