Tech Stack
AirflowAnsibleApacheAWSAzureChefCloudCyber SecurityDistributed SystemsDNSGoogle Cloud PlatformGrafanaHAProxyJenkinsKafkaKubernetesNGINXOpen SourceOraclePrometheusPuppetSaltStackSparkTerraform
About the role
- Build software and systems to manage platform infrastructure and applications
- Support Crowdstrike’s primary CI/CD build tools
- Building automation solutions for service deployment
- Monitoring availability and taking a holistic view of system health
- Improve reliability, quality, and serviceability of systems
- Architectural design of highly available services at an enterprise scale
- Interact with internal customers to understand needs and develop solutions
- Championing the Incident Response and Production Readiness Review (PRR) for our organization
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and root cause analysis
- Resource, capacity, and license forecasting
- Partner with and foster a mentorship mentality mentality with other Engineers
- Configuration and Optimization of Load Balancers (NGINX, HAProxy, Envoy, ect)
- Databases (Relational and Non-Relational)
- Key Value Stores/Message Brokers (ETCD, Kafka, Red Panda, ect)
Requirements
- On-Premise & Cloud expertise with deploying, scaling, and maintaining the following services: CI/CD tools Bazel, Github Actions, Jenkins
- IaC Provisioning tools Ansible, Chef, Puppet, Salt, Terraform
- Source Code Management services Bitbucket, Gitlab, Github
- Monitoring and Observability tooling Opensource Prometheus/Grafana, Datadog, Honeycomb, New Relic
- Experience with deploying applications on Kubernetes at scale
- 5+ years of experience working in a large-scale production environment.
- Proven ability to work effectively with both local and remote teams
- Must exhibit attention to detail, and have the ability to make good, timely decisions
- Ability to make trade offs between short term and long term goals
- Demonstrate self-learning capabilities, taking initiative in a fast pace/quickly changing environment.
- Security first mindset, general understanding of cybersecurity principles.
- Bonus Points: Experience adding and integrating AI into existing workflows; Familiarity with networking patterns (Load balancers, DNS, VIPS, Routing, Firewall rules); Knowledge of multiple cloud services (AWS/GCP/Azure/Oracle); Experience with data science principles and tooling such as Apache Airflow, Apache Spark, ect; Experience with creating automated reporting on a monthly/quarterly/annual basis