Crusoe

Staff Network Operations Engineer

Crusoe

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Apply

Salary

💰 $204,000 - $247,000 per year

Job Level

Lead

Tech Stack

AnsibleAWSAzureChefCloudGoogle Cloud PlatformPuppetPythonTCP/IP

About the role

  • Responsible for the operations for Global Crusoe Cloud Network ensuring that the network is up through monitoring and applying break fixes for outages
  • Design, build, and operate the global edge, backbone, and data center network for High-Performance Compute (HPC) Clusters with GPUs
  • Manage and optimize Crusoe Energy Cloud's global network, including edge, backbone, data center, and public cloud connectivity
  • Collaborate with Network Engineering and cross-functional teams including Software Infrastructure and Product to drive network innovation and evolution
  • Lead operational excellence initiatives—develop monitoring, alerting, and self-healing systems to ensure high network availability
  • Perform advanced troubleshooting and root cause analysis for incidents, guiding post-mortem reviews and improvements
  • Mentor network engineers and establish best practices for incident response, documentation, and operational readiness
  • Participate in 24/7 on-call support for the Crusoe Network

Requirements

  • 10+ years of related experience building and operating at scale in a production environment
  • In-depth knowledge of network protocols including TCP/IP, QoS, BGP, OSPF/IS-IS, EVPN, VXLAN, QoS and MPLS-related technologies like RSVP-TE, LDP
  • Good understanding of network monitoring protocols and tools, such as SNMP, IPFIX, Sflow/netflow, and Telemetry
  • Experience with tools like Kentik, Arbor, ThousandEyes, Catchpoint, Packet Design
  • Familiar with data center network architecture, such as Fat Tree architecture, CLOS, BGP-TE, and peering for edge
  • Hands-on experience with major network devices like Mellanox, Cisco, Arista, Juniper
  • Familiar with mainstream commercial switch/router chipsets, such as Broadcom, Barefoot
  • Familiarity with technologies like RDMA, Infiniband, and RoCE (plus)
  • In-depth knowledge of public cloud architecture connectivity options to AWS, GCP, Azure, Ali Cloud, OCI
  • Good understanding of IPv6 and IPv4-IPv6 coexistence technologies
  • Programming/scripting in Python, Ansible, Puppet, Chef, or other languages (plus)
  • Self-motivated, with good communication and writing skills
  • Team player and participate in Crusoe Energy Cloud network global on-call rotation
  • Bachelor's in Computer Science, Information Science, Engineering, Mathematics, or a related field, or experience equivalent to a Bachelor's degree based on three or more years of work experience
NVIDIA

Senior Site Reliability Engineer

NVIDIA
Seniorfull-time🇮🇳 India
Posted: 19 days agoSource: nvidia.wd5.myworkdayjobs.com
AnsibleAWSAzureChefCloudGoGoogle Cloud PlatformGrafanaKubernetesLinuxMicroservicesPrometheus+5 more
Red Hat

Senior Site Reliability Engineer

Red Hat
Seniorfull-time$111k–$184k / yearColorado · 🇺🇸 United States
Posted: 4 days agoSource: redhat.wd5.myworkdayjobs.com
AnsibleAWSAzureChefCloudDistributed SystemsDNSDockerGoGoogle Cloud PlatformJavaKubernetes+8 more
NVIDIA

Senior Site Reliability Engineer

NVIDIA
Seniorfull-time$208k–$334k / yearCalifornia · 🇺🇸 United States
Posted: 23 days agoSource: nvidia.wd5.myworkdayjobs.com
AnsibleAWSAzureChefCloudDistributed SystemsDNSGoGoogle Cloud PlatformGrafanaKubernetesLinux+7 more
ServerHub

Junior Network Engineer

ServerHub
Juniorfull-time🇺🇸 United States
Posted: 13 days agoSource: serverhubhr.applytojob.com
AnsibleAWSAzureCitrixCloudDNSGrafanaHAProxyNGINXPythonSMTPSwitching+2 more
Protera

Network Operations Engineer

Protera
Mid · Seniorfull-time🇺🇸 United States
Posted: 4 days agoSource: apply.workable.com
AnsibleAWSCloudDNSFirewallsPythonSwitchingTCP/IPTerraform