
Manager, Super Intelligence HPC Support
Lambda
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteSalary
💰 $160,000 - $282,000 per year
Job Level
Mid-LevelSenior
Tech Stack
CloudKubernetesLinux
About the role
- Lead & Develop: Build, coach, and mentor a team of Super Intelligence HPC Support Engineers, ensuring technical excellence and strong execution in customer-facing work.
- Escalation Ownership: Take point on high-visibility incidents and escalations with hyperscale customers, ensuring timely, transparent, and high-quality outcomes.
- Customer Advocacy: Represent the needs of Super Intelligence customers in cross-functional discussions, influencing product design and roadmap decisions to improve supportability.
- Incident Leadership: Guide your team through major incidents, driving consistency in communication, coordination, and resolution under pressure.
- Operational Excellence: Define and refine support processes, runbooks, and documentation tailored to hyperscale environments.
- Partnership: Collaborate closely with Product, Engineering, and Data Center teams to ensure Lambda delivers reliable, scalable solutions at the largest levels of deployment.
- Metrics & Accountability: Monitor team performance, drive improvements in SLA adherence, response/resolution quality, and customer satisfaction.
- Hands-On Leadership: Step in to troubleshoot complex issues and model the standard of excellence expected from your team.
Requirements
- Proven track record leading technical support or engineering teams serving enterprise or hyperscale customers.
- Skilled at managing customer escalations and major incidents with clarity, confidence, and urgency.
- Deep expertise in HPC environments including GPU clusters, InfiniBand/RoCE networks, and Linux system administration.
- Ability to guide engineers through troubleshooting at scale, from orchestration (Slurm/Kubernetes) down to kernel-level debugging.
- Strong leadership presence: able to inspire, set direction, and build a culture of accountability and customer-first execution.
- Excellent communication skills, capable of engaging with both engineers and executive stakeholders.
- Advanced degree in Computer Science, Engineering, or related field (nice to have).
- Certifications in HPC, networking, or related technologies (nice to have).
- Experience with Slurm, Kubernetes, InfiniBand, and other high-performance interconnects (RoCE, NVLink/NVSwitch) (nice to have).
- Background supporting Private Cloud environments or other dedicated enterprise clusters (nice to have).
- Experience supporting enterprise AI workloads across startups and Fortune 500 companies (nice to have).
Benefits
- Health, dental, and vision coverage for you and your dependents
- Wellness and Commuter stipends for select roles
- 401k Plan with 2% company match (USA employees)
- Flexible Paid Time Off Plan that we all actually use
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
HPC environmentsGPU clustersInfiniBandRoCE networksLinux system administrationSlurmKuberneteskernel-level debuggingtroubleshootingenterprise AI workloads
Soft skills
leadershipcommunicationcustomer advocacyincident managementteam mentoringoperational excellencecollaborationaccountabilityproblem-solvinginfluencing
Certifications
HPC certificationnetworking certification