Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Principal Software Engineer – Rack Scale Systems Infrastructure

NVIDIA

Principal Software Engineer at NVIDIA building software systems for rack-scale infrastructure capabilities. Collaborating across teams to develop dependable, manageable, and programmable solutions for AI-powered applications.

Posted 5/16/2026full-timeRemote • California, Massachusetts, North Carolina, Texas • 🇺🇸 United StatesLead💰 $272,000 - $431,250 per yearWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsGoKubernetesLinuxOpen SourceRust

About the role

Key responsibilities & impact
  • Define the complete software architecture for rack-scale infrastructure products and services, covering control plane services, infrastructure management, firmware, operating systems, kernel drivers, networking fabrics, accelerator software, and user-mode manageability software.
  • Use Kubernetes and cloud-native primitives as an infrastructure fabric when appropriate. This includes controllers, operators, reconciliation loops, and open source components. These components can operate safely at rack and fleet scale.
  • Build open source infrastructure software that can be embraced in different forms, including libraries, services, controllers, operators, and integration APIs for internal deployments and CSP environments.
  • Bridge hardware and software teams across firmware, BMC, BIOS, boot flows, OS images, drivers, networking, NVLink domains, InfiniBand, GPUs, DPUs, CPUs, and system management interfaces.
  • Translate forward-looking infrastructure roadmaps into formal software requirements, architecture specifications, and execution plans that align teams across the organization.
  • Partner directly with hyperscalers, CSPs, enterprise customers, internal component leads, vendors, and business partners to align infrastructure capabilities with real-world deployment and integration needs.
  • Establish reliability, security, validation, and left-shift strategies that reduce risk before hardware reaches production environments.
  • Mentor senior engineers and technical leads, raising the engineering bar for large-scale networked systems, foundational software, and rack-scale control plane development.
  • Make high-quality technical decisions in ambiguous environments, balancing customer needs, schedule, hardware realities, software maintainability, open source adoption, and long-term infrastructure evolution.

Requirements

What you’ll need
  • BS or MS in Computer Engineering, Computer Science, Electrical Engineering, or a related field, or equivalent experience.
  • Proven experience (15+ years) in systems architecture, system software, distributed systems, infrastructure control planes, or infrastructure engineering.
  • Solid architectural knowledge of coordination frameworks, state machines, declarative APIs, reconciliation loops, lifecycle orchestration, failure handling, upgrade and rollback workflows, and distributed systems tradeoffs.
  • Practical coding skills in Go, C++, or Rust, encompassing the capability to write, review, and direct production-quality infrastructure software.
  • Experience with Rust is highly valued.
  • Experience with Kubernetes or similar orchestration systems, especially as a fabric for managing infrastructure, hardware resources, or large-scale infrastructure services.
  • Experience with Linux-based infrastructure software, OS rollout and image management, kernel or driver interactions, firmware lifecycle, and hardware bring-up workflows.
  • Strong understanding of data center networking technologies and protocols, such as Ethernet, InfiniBand, RDMA, and fabric-level manageability.
  • Experience with complex accelerator-based systems, including GPUs, DPUs, FPGAs, custom silicon, or other high-performance computing systems.
  • Expertise in in-band and out-of-band management architectures, including BMCs, Redfish, IPMI, and related system management protocols.
  • Ability to work with security experts to define practical tradeoffs across secure boot, attestation, access control, update safety, serviceability, and ease of operation.
  • Experience crafting software intended for open source release, including API stability, modularity, documentation, community usability, and clean separation between shared software and deployment-specific integrations.
  • Experience using AI-assisted development tools responsibly as an engineering multiplier for coding, test generation, debugging, build iteration, and documentation.
  • Established skill in specifying requirements, guiding architecture, and managing delivery across various engineering teams and organizations.
  • Strong written and verbal communication skills, enabling clear explanation of complex hardware/software tradeoffs to engineering leaders, customers, partners, and executives.

Benefits

Comp & perks
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
systems architecturesystem softwaredistributed systemsinfrastructure control planesGoC++RustLinux-based infrastructure softwaredata center networking technologiesin-band and out-of-band management architectures
Soft Skills
mentoringtechnical decision-makingcommunicationcollaborationrequirement specificationarchitecture guidancedelivery managementproblem-solvingrisk managementcustomer alignment