Salary
💰 $204,000 - $247,000 per year
Tech Stack
AnsibleAWSAzureChefCloudGoogle Cloud PlatformPuppetPythonTCP/IP
About the role
- Responsible for the operations for Global Crusoe Cloud Network ensuring that the network is up through monitoring and applying break fixes for outages
- Design, build, and operate the global edge, backbone, and data center network for High-Performance Compute (HPC) Clusters with GPUs
- Manage and optimize Crusoe Energy Cloud's global network, including edge, backbone, data center, and public cloud connectivity
- Collaborate with Network Engineering and cross-functional teams including Software Infrastructure and Product to drive network innovation and evolution
- Lead operational excellence initiatives—develop monitoring, alerting, and self-healing systems to ensure high network availability
- Perform advanced troubleshooting and root cause analysis for incidents, guiding post-mortem reviews and improvements
- Mentor network engineers and establish best practices for incident response, documentation, and operational readiness
- Participate in 24/7 on-call support for the Crusoe Network
Requirements
- 10+ years of related experience building and operating at scale in a production environment
- In-depth knowledge of network protocols including TCP/IP, QoS, BGP, OSPF/IS-IS, EVPN, VXLAN, QoS and MPLS-related technologies like RSVP-TE, LDP
- Good understanding of network monitoring protocols and tools, such as SNMP, IPFIX, Sflow/netflow, and Telemetry
- Experience with tools like Kentik, Arbor, ThousandEyes, Catchpoint, Packet Design
- Familiar with data center network architecture, such as Fat Tree architecture, CLOS, BGP-TE, and peering for edge
- Hands-on experience with major network devices like Mellanox, Cisco, Arista, Juniper
- Familiar with mainstream commercial switch/router chipsets, such as Broadcom, Barefoot
- Familiarity with technologies like RDMA, Infiniband, and RoCE (plus)
- In-depth knowledge of public cloud architecture connectivity options to AWS, GCP, Azure, Ali Cloud, OCI
- Good understanding of IPv6 and IPv4-IPv6 coexistence technologies
- Programming/scripting in Python, Ansible, Puppet, Chef, or other languages (plus)
- Self-motivated, with good communication and writing skills
- Team player and participate in Crusoe Energy Cloud network global on-call rotation
- Bachelor's in Computer Science, Information Science, Engineering, Mathematics, or a related field, or experience equivalent to a Bachelor's degree based on three or more years of work experience