Lambda

Staff Network Architect, HPC

Lambda

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $349,000 - $523,000 per year

Job Level

Lead

Tech Stack

CloudSwitching

About the role

  • Architect high-performance networking solutions that power cloud platforms, with a focus on ultra-low-latency and high-bandwidth connectivity
  • Define the network topology and architectural patterns for large-scale GPU clusters, storage backends, and multi-tenant environments
  • Evaluate, benchmark, and select next-generation network technologies (e.g., InfiniBand NDR/XDR, RoCE, 400G/800G Ethernet) to meet AI workload requirements
  • Develop and maintain network architecture standards, reference designs, and scalability roadmaps for multi-site and hybrid environments
  • Partner with compute and storage architects to ensure seamless end-to-end data flow and fault tolerance
  • Guide network automation strategies and tooling to enable efficient provisioning, telemetry, and operational visibility
  • Mentor engineers and cross-functional teams on advanced network concepts, troubleshooting, and architectural best practices

Requirements

  • Proven experience (7+ years) architecting high-performance data center networks, preferably for HPC, AI/ML, or large-scale cloud infrastructure
  • Deep expertise with InfiniBand (HDR/NDR) and advanced Ethernet fabrics, including RoCE and RDMA protocols
  • Strong understanding of data center switching architectures, congestion control, QoS, and network virtualization (VXLAN, EVPN)
  • Skilled in designing for low-latency and high-throughput data paths, including GPU-to-GPU and storage traffic optimization
  • Proficient with routing/switching protocols (BGP, OSPF) and software-defined networking (SDN) concepts
  • Experience building resilient, fault-tolerant network architectures with redundancy, failover, and high availability
  • Excellent communication and leadership skills, capable of influencing technical decisions across diverse teams
  • Willing and able to work onsite at our San Francisco office 4 days per week (Lambda’s designated work from home day is Tuesday)
  • Nice to have: Hands-on experience with AI workload profiling, collective communication patterns (e.g., NCCL, MPI), and network tuning for distributed training
  • Nice to have: Familiarity with network automation frameworks and telemetry tools
  • Nice to have: Exposure to DPU/SmartNIC technologies, including NVIDIA BlueField, or similar
  • Nice to have: Knowledge of large-scale, multi-site interconnect design, including DWDM or metro/long-haul networking
  • Nice to have: Experience collaborating with hyperscale or enterprise customers on highly customized network designs