Salary
💰 $160,000 - $210,000 per year
Tech Stack
AnsibleGrafanaPrometheusPythonSwitchingTerraform
About the role
- Implement and maintain high-throughput, low-latency networks supporting AI Factory workloads and distributed training infrastructure.
- Work hands-on to deploy, configure, and troubleshoot routing, switching, optics, and interconnect systems across data centers.
- Operate and optimize layer 2/3 network services: BGP, EVPN/VXLAN, OSPF, MPLS, QoS, and ACLs.
- Work with Infiniband Networking Systems and Nvidia Fabric Manager (UFM)
- Develop and maintain network automation (e.g., Ansible, Python, Terraform) for provisioning, compliance, and operational workflows.
- Monitor network health and performance using telemetry tools and help scale observability platforms.
- Participate in the incident response rotation and perform root cause analysis on service-impacting events.
- Maintain configuration standards, documentation, and change management in line with infrastructure governance processes.
- Collaborate with the Principal Network Engineer on architectural decisions and vendor evaluations.
Requirements
- 5–8+ years of hands-on experience in large-scale network engineering, data center networks, or service provider infrastructure
- Strong knowledge of IP networking, BGP, OSPF, EVPN/VXLAN, and L2/L3 design principles
- Experience configuring and operating Arista, Juniper, or Cisco platforms in production environments
- Proficiency in scripting or automation (e.g., Python, Bash, Ansible)
- Solid troubleshooting skills and experience with real-time diagnostics and packet analysis
- Familiarity with monitoring and telemetry tools (e.g., Prometheus, Grafana, sFlow, InfluxDB)
- comprehensive health insurance with 100% of premiums covered by Voltage Park
- 5% 401k match
- equity package
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
network engineeringBGPOSPFEVPNVXLANMPLSQoSACLsnetwork automationscripting
Soft skills
troubleshootingcollaborationincident responseroot cause analysisdocumentationchange management