FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Principal Software Engineer – Rack Scale Systems Infrastructure
NVIDIAPrincipal Software Engineer at NVIDIA building software systems for rack-scale infrastructure capabilities. Collaborating across teams to develop dependable, manageable, and programmable solutions for AI-powered applications.
Posted 5/16/2026full-timeRemote • California, Massachusetts, North Carolina, Texas • 🇺🇸 United StatesLead💰 $272,000 - $431,250 per yearWebsite
Tech Stack
Tools & technologiesCloudDistributed SystemsGoKubernetesLinuxOpen SourceRust
About the role
Key responsibilities & impact- Define the complete software architecture for rack-scale infrastructure products and services, covering control plane services, infrastructure management, firmware, operating systems, kernel drivers, networking fabrics, accelerator software, and user-mode manageability software.
- Use Kubernetes and cloud-native primitives as an infrastructure fabric when appropriate. This includes controllers, operators, reconciliation loops, and open source components. These components can operate safely at rack and fleet scale.
- Build open source infrastructure software that can be embraced in different forms, including libraries, services, controllers, operators, and integration APIs for internal deployments and CSP environments.
- Bridge hardware and software teams across firmware, BMC, BIOS, boot flows, OS images, drivers, networking, NVLink domains, InfiniBand, GPUs, DPUs, CPUs, and system management interfaces.
- Translate forward-looking infrastructure roadmaps into formal software requirements, architecture specifications, and execution plans that align teams across the organization.
- Partner directly with hyperscalers, CSPs, enterprise customers, internal component leads, vendors, and business partners to align infrastructure capabilities with real-world deployment and integration needs.
- Establish reliability, security, validation, and left-shift strategies that reduce risk before hardware reaches production environments.
- Mentor senior engineers and technical leads, raising the engineering bar for large-scale networked systems, foundational software, and rack-scale control plane development.
- Make high-quality technical decisions in ambiguous environments, balancing customer needs, schedule, hardware realities, software maintainability, open source adoption, and long-term infrastructure evolution.
Requirements
What you’ll need- BS or MS in Computer Engineering, Computer Science, Electrical Engineering, or a related field, or equivalent experience.
- Proven experience (15+ years) in systems architecture, system software, distributed systems, infrastructure control planes, or infrastructure engineering.
- Solid architectural knowledge of coordination frameworks, state machines, declarative APIs, reconciliation loops, lifecycle orchestration, failure handling, upgrade and rollback workflows, and distributed systems tradeoffs.
- Practical coding skills in Go, C++, or Rust, encompassing the capability to write, review, and direct production-quality infrastructure software.
- Experience with Rust is highly valued.
- Experience with Kubernetes or similar orchestration systems, especially as a fabric for managing infrastructure, hardware resources, or large-scale infrastructure services.
- Experience with Linux-based infrastructure software, OS rollout and image management, kernel or driver interactions, firmware lifecycle, and hardware bring-up workflows.
- Strong understanding of data center networking technologies and protocols, such as Ethernet, InfiniBand, RDMA, and fabric-level manageability.
- Experience with complex accelerator-based systems, including GPUs, DPUs, FPGAs, custom silicon, or other high-performance computing systems.
- Expertise in in-band and out-of-band management architectures, including BMCs, Redfish, IPMI, and related system management protocols.
- Ability to work with security experts to define practical tradeoffs across secure boot, attestation, access control, update safety, serviceability, and ease of operation.
- Experience crafting software intended for open source release, including API stability, modularity, documentation, community usability, and clean separation between shared software and deployment-specific integrations.
- Experience using AI-assisted development tools responsibly as an engineering multiplier for coding, test generation, debugging, build iteration, and documentation.
- Established skill in specifying requirements, guiding architecture, and managing delivery across various engineering teams and organizations.
- Strong written and verbal communication skills, enabling clear explanation of complex hardware/software tradeoffs to engineering leaders, customers, partners, and executives.
Benefits
Comp & perks- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
systems architecturesystem softwaredistributed systemsinfrastructure control planesGoC++RustLinux-based infrastructure softwaredata center networking technologiesin-band and out-of-band management architectures
Soft Skills
mentoringtechnical decision-makingcommunicationcollaborationrequirement specificationarchitecture guidancedelivery managementproblem-solvingrisk managementcustomer alignment