
Capacity and Infrastructure Operations Manager
Parasail.ai
full-time
Posted on:
Location Type: Hybrid
Location: San Mateo • California • United States
Visit company websiteExplore more
Tech Stack
About the role
- Own real-time fleet utilization: identify and resolve idle capacity, inefficiencies, and demand/supply mismatches.
- Define utilization targets and operating policies that balance performance, reliability, and cost.
- Develop policies and processes for lifecycle management of vendor-sourced instances (bring-up, steady state, rebalancing, decommissioning).
- Partner with Engineering to define requirements and prioritize automations for capacity acquisition, scaling, rebalancing, failovers, and cost controls.
- Model and monitor GPU unit economics: cost per GPU-hr, marginal cost, blended vendor rates, and cost leakage.
- Partner with Finance & Product to align customer pricing with underlying vendor economics.
- Deliver monthly/quarterly reporting on supply-side cost trends and margin performance, including key drivers and recommended actions.
- Recommend improvements to pricing, contract mix, vendor allocation, and operational policies to expand gross margin.
- Build and maintain forecasting models to predict demand, burst behavior, seasonality, and reserve requirements.
- Determine the optimal mix of contract types (on-demand, committed use, short-term) to maximize flexibility and margin.
- Maintain capacity buffers and contingency plans to protect against vendor outages, degraded performance, or sudden demand spikes.
- Source, evaluate, and manage relationships with neocloud and GPU infrastructure providers.
- Negotiate pricing, SLAs, commitments, contractual flexibility, and scaling terms.
- Create and maintain vendor scorecards (pricing, reliability, latency, responsiveness, and fit).
- Identify emerging vendors, negotiate trial capacity, and assess cost–performance tradeoffs.
- Develop a multi-vendor redundancy strategy to minimize single-provider risk.
- Stand up the core dashboards and operating cadence to monitor and manage (examples): per GPU family utilization, utilization by contract type, idle capacity, blended cost per GPU hour, vendor latency / performance.
- Help define a practical tool stack for capacity planning and financial analysis.
Requirements
- 5+ years in capacity operations, infrastructure operations, technical operations, cloud supply/vendor ops, or a closely related role.
- Demonstrated experience managing external infrastructure vendors (performance management, SLAs, commercial terms, escalation paths).
- Strong analytical skills and comfort with unit economics (cost drivers, margin, pricing inputs, forecasting).
- Experience building operating cadences and dashboards that drive action (not just reporting).
- Clear communicator who can align Engineering, Finance, Product, and GTM around priorities and tradeoffs.
Benefits
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
capacity operationsinfrastructure operationscloud supply operationsvendor managementunit economicsforecastingdashboard creationoperating policiescost analysisvendor scorecards
Soft Skills
analytical skillscommunicationcollaborationnegotiationproblem-solvingrelationship managementstrategic thinkingreportingaction-orientedflexibility