CrowdStrike

Manager, Network Reliability Engineering

CrowdStrike

full-time

Posted on:

Origin:  • 🇺🇸 United States • Virginia

Visit company website
AI Apply
Apply

Salary

💰 $140,000 - $215,000 per year

Job Level

SeniorLead

Tech Stack

AWSCloudGoGoogle Cloud PlatformPerlPython

About the role

  • Set the direction for and improve the reliability and efficiency of the network
  • Contribute to maintaining a high-performance, fault-tolerant, and scalable network
  • Define metrics, develop tools, and create approaches that improve how we monitor and operate the network
  • Develop, track, and report on KPIs and metrics that measure network capacity, performance, and availability
  • Build tools and monitoring systems that provide granular, real-time observability
  • Develop automation to continuously assess and detect suboptimal network state and identify potential points of failure
  • Review designs and traffic patterns to continually assess network capacity and availability
  • Work with other engineering groups to close the feedback loop on areas for improvement
  • Lead resolution of network incidents, conduct internal post-mortems, perform root cause analysis, and ensure corrective actions are taken in a timely manner
  • Diagnose and solve complex network and application problems, and recommend improvements
  • Participate in a 24X7 on-call rotation

Requirements

  • United States Citizenship OR Permanent Residency is necessary to retain access to resources for this role (NO Clearance necessary)
  • 7+ years deploying and managing network infrastructure
  • Experience leading a sustaining engineering or SRE team
  • 7+ years experience working with network protocols such as BGP, MPLS (TE, Auto-BW), VxLAN, eVPN, and CLOS Architectures
  • Experience with building and maintaining network monitoring and graphing tools, as well as streaming telemetry
  • Programming experience in Python, Perl, Go or other scripting language
  • Experience with Cloud Providers such as AWS and GCP
  • Ability to participate in a 24X7 on-call rotation
  • Willingness to periodically undergo and pass additional background and fingerprint check(s) consistent with government customer requirements
  • (Bonus) Strong track record of developing and improving tools, platforms, and infrastructure
  • (Bonus) Experience with network simulation and testing tools (NS-3, NetSim, Batfish, Ixia)
  • (Bonus) Production level experience supporting large scale network infrastructure
  • (Bonus) Experience in the automation of systems to reduce operational toil
NVIDIA

Senior Site Reliability Engineer, AI Infrastructure

NVIDIA
Seniorfull-time$184k–$357k / yearCalifornia · 🇺🇸 United States
Posted: 27 days agoSource: nvidia.wd5.myworkdayjobs.com
AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformLinuxPerlPrometheusPythonPyTorchRay+3 more
Toast

Senior Site Reliability Engineer – Process Automation

Toast
Seniorfull-time$134k–$214k / yearMassachusetts · 🇺🇸 United States
Posted: 18 days agoSource: boards.greenhouse.io
AWSAzureCloudGoGoogle Cloud PlatformITSMPythonTerraform
Getinz

Principal Engineer, Data Platform

Getinz
Leadfull-time🇮🇳 India
Posted: 16 days agoSource: getinz-people.freshteam.com
AWSCloudDistributed SystemsGoGoogle Cloud PlatformGRPCJava
Samsara

Senior Software Engineer – Optimization Engineering

Samsara
Seniorfull-time$143k–$185k / year🇨🇦 Canada
Posted: 12 days agoSource: boards.greenhouse.io
AWSAzureCloudDistributed SystemsGoGoogle Cloud PlatformIoTJava
General Dynamics Information Technology

Cyber Architect/Engineer, Cloud SME

General Dynamics Information Technology
Senior · Leadfull-time$144k–$195k / yearNorth Carolina · 🇺🇸 United States
Posted: 6 days agoSource: gdit.wd5.myworkdayjobs.com
AWSAzureCloudCyber SecurityDNSGoogle Cloud PlatformLinux