RunPod

Manager, Datacenter Network Engineering

RunPod

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $150,000 - $240,000 per year

Job Level

Tech Stack

About the role

  • Manage and grow a team of network engineers responsible for datacenter fabrics, interconnects, and global WAN connectivity. Provide mentorship, technical guidance, and clear ownership boundaries.
  • Define and evolve network designs for GPU-heavy clusters, including spine-leaf topologies, ECMP routing, and high-bandwidth east-west traffic patterns.
  • Oversee design and operation of InfiniBand and RoCE-based fabrics supporting distributed training and inference workloads. Ensure performance, loss characteristics, and congestion control meet AI workload requirements.
  • Guide implementation and operations of encapsulation technologies such as VXLAN, EVPN, Geneve, or similar, enabling scalable multi-tenant isolation and flexible network provisioning.
  • Lead strategy and execution for global WAN connectivity, including private backbone links, IX connectivity, and hybrid connectivity with cloud providers and partners.
  • Establish operational best practices for monitoring, capacity planning, change management, incident response, and post-mortems across the network stack.
  • Partner closely with Infrastructure, SRE, Hardware, and Product Engineering teams to ensure network capabilities align with platform and customer requirements.
  • Work with hardware vendors, colocation providers, and transit partners on network design, procurement, deployment timelines, and escalations.
  • Ensure network designs support secure isolation, DDoS resilience, and compliance requirements without compromising performance.

Requirements

  • 3+ years managing network or infrastructure engineering teams, with experience scaling teams and systems in production environments.
  • 8+ years designing and operating large-scale datacenter networks, including spine-leaf architectures, BGP-based routing, and high-throughput fabrics.
  • Strong hands-on experience with VXLAN/EVPN or equivalent encapsulation protocols, including control-plane and data-plane considerations.
  • Proven experience with InfiniBand and/or RoCE, including congestion management, lossless Ethernet concepts, and performance tuning for GPU workloads.
  • Deep familiarity with global WAN technologies, including private backbone design, inter-region connectivity, routing policy, and traffic engineering.
  • Comfortable working with Linux-based systems, network operating systems, and automation tooling.
  • Strong background in network observability, incident management, capacity forecasting, and change control.
  • Clear written and verbal communication skills, with the ability to align stakeholders and lead teams through complex technical challenges.
  • Successful completion of a background check.
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
network designspine-leaf architectureBGP routingVXLANEVPNInfiniBandRoCEDDoS resilienceLinux-based systemsnetwork automation
Soft Skills
mentorshiptechnical guidancecommunicationteam leadershipstakeholder alignmentincident managementcapacity planningchange managementproblem-solvingcollaboration