CrowdStrike

Reliability Engineer III

CrowdStrike

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowAnsibleApacheAWSAzureChefCloudCyber SecurityDistributed SystemsDNSGoogle Cloud PlatformGrafanaHAProxyJenkinsKafkaKubernetesNGINXOpen SourceOraclePrometheusPuppetSaltStackSparkTerraform

About the role

  • Build software and systems to manage platform infrastructure and applications
  • Support Crowdstrike’s primary CI/CD build tools
  • Building automation solutions for service deployment
  • Monitoring availability and taking a holistic view of system health
  • Improve reliability, quality, and serviceability of systems
  • Architectural design of highly available services at an enterprise scale
  • Interact with internal customers to understand needs and develop solutions
  • Championing the Incident Response and Production Readiness Review (PRR) for our organization
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and root cause analysis
  • Resource, capacity, and license forecasting
  • Partner with and foster a mentorship mentality mentality with other Engineers
  • Configuration and Optimization of Load Balancers (NGINX, HAProxy, Envoy, ect)
  • Databases (Relational and Non-Relational)
  • Key Value Stores/Message Brokers (ETCD, Kafka, Red Panda, ect)

Requirements

  • On-Premise & Cloud expertise with deploying, scaling, and maintaining the following services: CI/CD tools Bazel, Github Actions, Jenkins
  • IaC Provisioning tools Ansible, Chef, Puppet, Salt, Terraform
  • Source Code Management services Bitbucket, Gitlab, Github
  • Monitoring and Observability tooling Opensource Prometheus/Grafana, Datadog, Honeycomb, New Relic
  • Experience with deploying applications on Kubernetes at scale
  • 5+ years of experience working in a large-scale production environment.
  • Proven ability to work effectively with both local and remote teams
  • Must exhibit attention to detail, and have the ability to make good, timely decisions
  • Ability to make trade offs between short term and long term goals
  • Demonstrate self-learning capabilities, taking initiative in a fast pace/quickly changing environment.
  • Security first mindset, general understanding of cybersecurity principles.
  • Bonus Points: Experience adding and integrating AI into existing workflows; Familiarity with networking patterns (Load balancers, DNS, VIPS, Routing, Firewall rules); Knowledge of multiple cloud services (AWS/GCP/Azure/Oracle); Experience with data science principles and tooling such as Apache Airflow, Apache Spark, ect; Experience with creating automated reporting on a monthly/quarterly/annual basis