
Manager, Datacenter Network Engineering
RunPod
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $150,000 - $240,000 per year
About the role
- Manage and grow a team of network engineers responsible for datacenter fabrics, interconnects, and global WAN connectivity. Provide mentorship, technical guidance, and clear ownership boundaries.
- Define and evolve network designs for GPU-heavy clusters, including spine-leaf topologies, ECMP routing, and high-bandwidth east-west traffic patterns.
- Oversee design and operation of InfiniBand and RoCE-based fabrics supporting distributed training and inference workloads. Ensure performance, loss characteristics, and congestion control meet AI workload requirements.
- Guide implementation and operations of encapsulation technologies such as VXLAN, EVPN, Geneve, or similar, enabling scalable multi-tenant isolation and flexible network provisioning.
- Lead strategy and execution for global WAN connectivity, including private backbone links, IX connectivity, and hybrid connectivity with cloud providers and partners.
- Establish operational best practices for monitoring, capacity planning, change management, incident response, and post-mortems across the network stack.
- Partner closely with Infrastructure, SRE, Hardware, and Product Engineering teams to ensure network capabilities align with platform and customer requirements.
- Work with hardware vendors, colocation providers, and transit partners on network design, procurement, deployment timelines, and escalations.
- Ensure network designs support secure isolation, DDoS resilience, and compliance requirements without compromising performance.
Requirements
- 3+ years managing network or infrastructure engineering teams, with experience scaling teams and systems in production environments.
- 8+ years designing and operating large-scale datacenter networks, including spine-leaf architectures, BGP-based routing, and high-throughput fabrics.
- Strong hands-on experience with VXLAN/EVPN or equivalent encapsulation protocols, including control-plane and data-plane considerations.
- Proven experience with InfiniBand and/or RoCE, including congestion management, lossless Ethernet concepts, and performance tuning for GPU workloads.
- Deep familiarity with global WAN technologies, including private backbone design, inter-region connectivity, routing policy, and traffic engineering.
- Comfortable working with Linux-based systems, network operating systems, and automation tooling.
- Strong background in network observability, incident management, capacity forecasting, and change control.
- Clear written and verbal communication skills, with the ability to align stakeholders and lead teams through complex technical challenges.
- Successful completion of a background check.
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
network designspine-leaf architectureBGP routingVXLANEVPNInfiniBandRoCEDDoS resilienceLinux-based systemsnetwork automation
Soft Skills
mentorshiptechnical guidancecommunicationteam leadershipstakeholder alignmentincident managementcapacity planningchange managementproblem-solvingcollaboration