FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

VP of Engineering
HyperbolicVP of Engineering leading the design and evolution of AI cloud infrastructure at Hyperbolic Labs. Building GPU-native cloud systems and managing world-class engineering teams.
Tech Stack
Tools & technologiesCloudDistributed SystemsKubernetesLinuxRay
About the role
Key responsibilities & impact- Lead the design and evolution of our AI cloud platform
- Define the architecture for GPU orchestration, compute scheduling, networking, storage, and distributed systems
- Make critical decisions regarding cloud infrastructure, bare-metal deployments, and platform scalability
- Personally participate in architecture reviews and key technical initiatives
- Build and scale large GPU clusters supporting customer workloads
- Design systems for GPU provisioning, scheduling, utilization optimization, and capacity management
- Drive platform reliability and performance for AI training and inference workloads
- Partner closely with engineering teams on infrastructure requirements for next-generation AI systems
- Remain deeply involved in engineering decisions and technical direction
- Contribute directly to infrastructure design and implementation efforts
- Review architecture proposals, system designs, and major infrastructure changes
- Act as the technical escalation point for complex infrastructure challenges
- Establish best practices for Kubernetes, observability, CI/CD, security, and operational excellence
- Build SRE and Platform Engineering functions from the ground up
- Define reliability standards including SLOs, SLIs, incident response processes, and capacity planning
- Drive automation across infrastructure operations
- Recruit and develop world-class Infrastructure, Platform, and SRE teams
- Build a high-performance engineering culture focused on ownership and execution
- Partner with executive leadership on company strategy and infrastructure investments
- Manage infrastructure budgets, vendor relationships, and capacity planning
Requirements
What you’ll need- 12+ years building and operating large-scale infrastructure systems
- Experience leading infrastructure organizations while remaining hands-on technically
- Previous experience building or operating a cloud platform at scale
- Experience building GPU infrastructure or AI/ML compute platforms
- Proven track record scaling infrastructure in high-growth startup environments
- Expert-level Kubernetes knowledge
- Experience designing and operating multi-region cloud infrastructure
- Strong understanding of Linux, networking, distributed systems, and storage architecture
- Experience with Infrastructure-as-Code and automation frameworks
- Deep expertise in observability, monitoring, and reliability engineering
- Experience building highly available production systems
- Strongly Preferred: Experience with GPU scheduling, Slurm, Kubernetes GPU operators, Ray, or distributed training systems
- Experience managing thousands of GPUs in production environments
- Background supporting AI training and inference platforms
Benefits
Comp & perks- Health insurance
- Professional development
- Flexible work arrangements
- Paid time off
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU orchestrationcompute schedulingdistributed systemsKubernetesInfrastructure-as-Codeautomation frameworksobservabilitymonitoringreliability engineeringAI/ML compute platforms
Soft Skills
leadershiptechnical decision-makingcollaborationproblem-solvingteam developmentstrategic planningexecutioncommunicationownershipoperational excellence