Lambda

Staff Compute Architect, HPC

Lambda

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $349,000 - $523,000 per year

Job Level

Lead

Tech Stack

CloudKubernetes

About the role

  • Architect and define scalable compute platforms optimized for AI/ML, simulation, and high-throughput workloads
  • Develop compute system standards and design patterns to ensure consistency, performance, and maintainability across infrastructure
  • Evaluate emerging CPU, GPU, and accelerator technologies and own architectural tradeoff decisions affecting compute density, power, cooling, and total cost
  • Collaborate with product and engineering teams to map workload requirements to compute platform capabilities across bare metal and cloud deployments
  • Define compute platform roadmaps and architectural reference designs guiding hardware selection, firmware baselines, and rack-level configurations
  • Act as a technical lead during new platform introductions, guiding validation and performance characterization efforts
  • Mentor systems engineers and cross-functional stakeholders on compute performance tuning, sizing, and architectural decisions

Requirements

  • Proven experience (7+ years) architecting large-scale HPC or cloud compute platforms
  • Deep knowledge of CPU/GPU architectures, including system-on-chip integration, memory hierarchies, and accelerator topologies
  • Experience designing systems around high-bandwidth, low-latency fabrics (NVLink, InfiniBand)
  • Strong understanding of system performance tuning, resource scheduling, thermal and power optimization, and compute lifecycle management
  • Comfortable working across hardware and software boundaries, especially at the intersection of compute architecture, OS behavior, and orchestration layers
  • Skilled at balancing architectural tradeoffs for density, power efficiency, cooling, and performance
  • Strong analytical and communication skills, with a track record of influencing technical strategy across teams
  • Willingness and ability to work onsite at Lambda's San Francisco office 4 days per week
  • Nice to have: Hands-on experience with AI/ML workloads and their compute performance characteristics
  • Nice to have: Familiarity with orchestration tools used in HPC (Slurm, Kubernetes)
  • Nice to have: Experience with virtualization technologies, specifically GPU virtualization
  • Nice to have: Exposure to hardware validation, vendor collaboration, and long-term OEM roadmap alignment
  • Nice to have: Background in compute telemetry, real-time performance profiling, or large-scale A/B infrastructure testing