Senior Software Engineer - Distributed Inference

NVIDIA

full-time

Posted on: 8/26/2025

Origin: • 🇺🇸 United States • Arizona, Colorado

✨ AI Apply

💰 $184,000 - $356,500 per year

Senior

AWSAzureCloudDistributed SystemsGoogle Cloud PlatformKubernetesMicroservicesPythonRust

About the role

NVIDIA seeking Senior System Software Engineer for user-facing tools for Dynamo Inference Server
Build and maintain distributed model management systems, including Rust-based runtime components for large-scale AI inference workloads
Implement inference scheduling and deployment solutions on Kubernetes and Slurm; drive advances in scaling, orchestration, and resource management
Collaborate with infrastructure engineers and researchers to develop scalable APIs, services, and end-to-end inference workflows
Create monitoring, benchmarking, automation, and documentation processes to ensure low-latency, robust, production-ready inference systems on GPU clusters
Work in a remote-friendly, fast-paced team focused on GPU-accelerated deep learning software

Bachelor’s, Master’s, or PhD in Computer Science, ECE, or related field (or equivalent experience)
6+ years of professional systems software development experience
Strong programming expertise in Rust (C++ and Python are a plus)
Deep knowledge of distributed systems, runtime orchestration, and cluster-scale services
Hands-on experience with Kubernetes, container-based microservices, and integration with Slurm
Proven ability to excel in fast-paced R&D environments and collaborate across functions
(Nice-to-have) Experience with Dynamo Inference Server, TensorRT, ONNX Runtime, and LLM inference pipelines at scale
(Nice-to-have) Contributions to large-scale, low-latency distributed systems and GPU inference performance tuning (CUDA, cloud-native/hybrid environments)