Build comprehensive automation pipelines for infrastructure provisioning and service deployments
Provide operational support by diagnosing, triaging, and resolving complex system issues
Design and deploy bare metal Kubernetes clusters for GPU/AI workloads in customer datacenters
Design and implement datacenter networking with Nvidia Bluefield 3 DPUs
Configure and troubleshoot Infiniband fabrics for high-performance GPU interconnects
Implement Metal3-based bare metal provisioning pipelines for physical server infrastructure
Configure and integrate Kubevirt for VM-based workloads on Kubernetes
Deploy and manage k0rdent (Cluster API-based) tooling for Kubernetes cluster lifecycle management for tenant clusters
Implement GPU workload onboarding systems for training and inference
Build automation using GitHub CI for product integration testing
Work directly with product teams to collect and drive the requirements for future features / fixes
Requirements
Advanced Kubernetes expertise - Hands-on experience operating production clusters, including:
Deep understanding of Kubernetes architecture, controllers, and operators
Experience with Cluster API lifecycle management and upgrades
Troubleshooting complex multi-tenant environments
Custom Resource Definitions (CRDs) and operator patterns
Bare metal infrastructure management - Direct experience provisioning and managing physical servers, BIOS/firmware management, and hardware lifecycle automation
Virtualization technologies - Practical experience with KVM, LibVirt, and VM management on Linux
Software Defined Networking (SDN) - Understanding of overlay networks, network policies, and SDN controllers in Kubernetes and VM environments
Golang proficiency - Ability to read, debug, and contribute to Kubernetes operator code and controllers
CI/CD automation - Strong scripting and automation skills (Bash, Python, Ansible, Terraform) and experience building infrastructure-as-code pipelines
GitOps practices - Experience with declarative infrastructure management and Git-based workflows
Will be a strong Plus: InfiniBand networking experience
Cluster API framework experience
Nvidia GPU infrastructure (NVLink)
SmartNIC experience (Nvidia Bluefield or similar)
OVN (Open Virtual Network) or other SDN platforms
Metal3 or similar baremetal provisioning tools
Storage networking (NVMe-oF, Ceph)
GitHub Actions/CI or similar automation platforms.
Benefits
Work with an established Silicon Valley leader in the cloud infrastructure industry;
Work with exceptionally passionate, talented and engaging colleagues, helping Fortune 500 and Global 2000 customers implement next-generation cloud technologies;
Be a part of cutting-edge, open-source innovation;
Thrive in the high-energy environment of a young company where openness, collaboration, risk-taking, and continuous growth are valued;
Professional development and training;
Attend conferences and working groups;
Company outings, happy hours, hackathons, and tech talks;
Receive a competitive compensation package with a strong benefits plan.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.