Staff Network Architect, HPC

Lambda

full-time

Posted on: 9/4/2025

Location: California • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $349,000 - $523,000 per year

Job Level

Lead

Tech Stack

CloudSwitching

About the role

Architect high-performance networking solutions that power cloud platforms, with a focus on ultra-low-latency and high-bandwidth connectivity
Define the network topology and architectural patterns for large-scale GPU clusters, storage backends, and multi-tenant environments
Evaluate, benchmark, and select next-generation network technologies (e.g., InfiniBand NDR/XDR, RoCE, 400G/800G Ethernet) to meet AI workload requirements
Develop and maintain network architecture standards, reference designs, and scalability roadmaps for multi-site and hybrid environments
Partner with compute and storage architects to ensure seamless end-to-end data flow and fault tolerance
Guide network automation strategies and tooling to enable efficient provisioning, telemetry, and operational visibility
Mentor engineers and cross-functional teams on advanced network concepts, troubleshooting, and architectural best practices

Requirements

Proven experience (7+ years) architecting high-performance data center networks, preferably for HPC, AI/ML, or large-scale cloud infrastructure
Deep expertise with InfiniBand (HDR/NDR) and advanced Ethernet fabrics, including RoCE and RDMA protocols
Strong understanding of data center switching architectures, congestion control, QoS, and network virtualization (VXLAN, EVPN)
Skilled in designing for low-latency and high-throughput data paths, including GPU-to-GPU and storage traffic optimization
Proficient with routing/switching protocols (BGP, OSPF) and software-defined networking (SDN) concepts
Experience building resilient, fault-tolerant network architectures with redundancy, failover, and high availability
Excellent communication and leadership skills, capable of influencing technical decisions across diverse teams
Willing and able to work onsite at our San Francisco office 4 days per week (Lambda’s designated work from home day is Tuesday)
Nice to have: Hands-on experience with AI workload profiling, collective communication patterns (e.g., NCCL, MPI), and network tuning for distributed training
Nice to have: Familiarity with network automation frameworks and telemetry tools
Nice to have: Exposure to DPU/SmartNIC technologies, including NVIDIA BlueField, or similar
Nice to have: Knowledge of large-scale, multi-site interconnect design, including DWDM or metro/long-haul networking
Nice to have: Experience collaborating with hyperscale or enterprise customers on highly customized network designs