VP of Engineering

Hyperbolic

VP of Engineering leading the design and evolution of AI cloud infrastructure at Hyperbolic Labs. Building GPU-native cloud systems and managing world-class engineering teams.

Posted 6/12/2026full-timeSan Francisco • California • 🇺🇸 United StatesLeadWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

GPU orchestrationcompute schedulingdistributed systemsKubernetesInfrastructure-as-Codeautomation frameworksobservabilitymonitoringreliability engineeringAI/ML compute platforms

Soft Skills

leadershiptechnical decision-makingcollaborationproblem-solvingteam developmentstrategic planningexecutioncommunicationownershipoperational excellence

Tools & Technologies

cloud infrastructurebare-metal deploymentsSRECI/CDincident response processescapacity planningvendor managementLinuxnetworkingstorage architecture

Industry Keywords

large-scale infrastructure systemshigh-growth startup environmentsmulti-region cloud infrastructureproduction systemsGPU schedulingSlurmKubernetes GPU operatorsRayAI traininginference platforms

Tech Stack

Tools & technologies

CloudDistributed SystemsKubernetesLinuxRay

About the role

Key responsibilities & impact

Lead the design and evolution of our AI cloud platform
Define the architecture for GPU orchestration, compute scheduling, networking, storage, and distributed systems
Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability
Personally participate in architecture reviews and key technical initiatives
Build and scale large GPU clusters supporting customer workloads
Design systems for GPU provisioning, scheduling, utilization optimization, and capacity management
Drive platform reliability and performance for AI training and inference workloads
Partner closely with engineering teams on infrastructure requirements for next-generation AI systems
Remain deeply involved in engineering decisions and technical direction
Contribute directly to infrastructure design and implementation efforts
Review architecture proposals, system designs, and major infrastructure changes
Act as the technical escalation point for complex infrastructure challenges
Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence
Build SRE and Platform Engineering functions from the ground up
Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning
Drive automation across infrastructure operations
Recruit and develop world-class Infrastructure, Platform, and SRE teams
Build a high-performance engineering culture focused on ownership and execution
Partner with executive leadership on company strategy and infrastructure investments
Manage infrastructure budgets, vendor relationships, and capacity planning

Requirements

What you’ll need

12+ years building and operating large-scale infrastructure systems
Experience leading infrastructure organizations while remaining hands-on technically
Previous experience building or operating a cloud platform at scale
Experience building GPU infrastructure or AI/ML compute platforms
Proven track record scaling infrastructure in high-growth startup environments
Expert-level Kubernetes knowledge
Experience designing and operating multi-region cloud infrastructure
Strong understanding of Linux, networking, distributed systems, and storage architecture
Experience with Infrastructure-as-Code and automation frameworks
Deep expertise in observability, monitoring, and reliability engineering
Experience building highly available production systems
Strongly Preferred: Experience with GPU scheduling, Slurm, Kubernetes GPU operators, Ray, or distributed training systems
Experience managing thousands of GPUs in production environments
Background supporting AI training and inference platforms

Benefits

Comp & perks

Health insurance
Professional development
Flexible work arrangements
Paid time off