Lambda

Super Intelligence HPC Support Engineer

Lambda

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $160,000 - $206,000 per year

Job Level

SeniorLead

Tech Stack

CloudGrafanaKubernetesLinuxPrometheus

About the role

  • Act as the primary technical point of escalation for Super Intelligence customers running hyperscale GPU clusters.
  • Lead incident response for complex issues, ensuring rapid triage, clear communication, and timely resolution.
  • Proactively identify risks in large environments (firmware, performance bottlenecks, orchestration issues) and drive preventative improvements.
  • Partner closely with Lambda Engineering and Product teams to influence roadmap decisions based on real customer needs.
  • Contribute to runbooks, best practices, and operational guides tailored for hyperscale environments.
  • Train and mentor other support engineers, raising the bar across Lambda’s support organization.
  • Participate in a rotating on-call schedule, owning critical incidents and high-priority alerts for SI customers.

Requirements

  • 7+ years of experience in HPC or cloud support engineering, with customer-facing responsibilities.
  • Proven experience managing large-scale Linux clusters and distributed HPC/AI workloads.
  • Deep expertise in orchestration tools such as Kubernetes and/or Slurm.
  • Strong knowledge of GPU technologies (CUDA, NCCL, MIG, NVLink, GPUDirect RDMA).
  • Skilled in high-throughput networking (InfiniBand, RoCE) and cluster storage solutions.
  • Familiarity with monitoring/logging platforms (Prometheus, Grafana, Datadog).
  • Experience leading incident management and communicating directly with enterprise or hyperscale customers.
  • Ability to balance deep technical troubleshooting with clear, concise communication to executives and stakeholders.
Benefits
  • Health, dental, and vision coverage for you and your dependents
  • Wellness and Commuter stipends for select roles
  • 401k Plan with 2% company match (USA employees)
  • Flexible Paid Time Off Plan that we all actually use

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
HPCcloud support engineeringLinux clustersdistributed HPC workloadsorchestration toolsKubernetesSlurmGPU technologiesCUDAhigh-throughput networking
Soft skills
incident managementclear communicationmentoringproblem-solvingrisk identificationcustomer-facingteam collaborationtrainingleadershipconcise communication
Trimble Inc.

Technical Support Analyst

Trimble Inc.
Mid · Seniorfull-time$46k–$59k / yearCalifornia, Colorado, Idaho, Oregon, Utah · 🇺🇸 United States
Posted: 37 minutes agoSource: trimble.wd1.myworkdayjobs.com
Recruiting.com

Technical Support Representative I

Recruiting.com
Junior · Midfull-time$44k–$62k / yearMontana · 🇺🇸 United States
Posted: 41 minutes agoSource: myhrabc.wd5.myworkdayjobs.com
ComPsych

Absence Client Support Analyst

ComPsych
Junior · Midfull-time🇺🇸 United States
Posted: 2 hours agoSource: careers-compsych.icims.com
Artex Risk Solutions

Technical Support Supervisor

Artex Risk Solutions
Mid · Seniorfull-time$69k–$127k / year🇺🇸 United States
Posted: 3 hours agoSource: careers.arthrex.com
Canary Red

Technical Support Engineer – SkillBridge Intern

Canary Red
Entryinternship🇺🇸 United States
Posted: 4 hours agoSource: boards.greenhouse.io