CrowdStrike

IT Monitoring Engineer – Site Reliability

CrowdStrike

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSCloudDockerGoogle Cloud PlatformKubernetesPrometheusPythonSplunkTerraform

About the role

  • Design, implement, and maintain monitoring solutions across infrastructure and applications
  • Configure alerting thresholds and define/track SLOs and error budgets for critical services
  • Create and maintain dashboards providing real-time visibility into system health
  • Participate in on-call rotation, lead incident response, and conduct post-incident reviews
  • Document incidents, resolutions, and lessons learned; refine incident response procedures
  • Develop automation, scripts, and self-healing systems to remediate common issues
  • Integrate monitoring tools with operational systems and CI/CD pipelines
  • Collaborate with development, infrastructure, and security teams and provide monitoring best practices
  • Analyze monitoring data, implement metrics, and contribute to monitoring strategy improvements

Requirements

  • 5+ years of experience with enterprise monitoring tools (Prometheus, LogicMonitor, Datadog, ThousandEyes, Zscaler Digital Experience (ZDX))
  • Strong proficiency in scripting languages (Python, Bash, PowerShell) for automation
  • Experience with log management platforms (ELK stack, Splunk, LogScale)
  • Working knowledge of cloud services monitoring (AWS CloudWatch, GCP)
  • Experience with application performance monitoring (APM), digital experience monitoring (DEM) and infrastructure monitoring
  • Knowledge of SRE principles, SLOs, error budgets, and incident management
  • Experience with automated alerting, remediation workflows, and CI/CD pipeline monitoring
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and containerization (Docker, Kubernetes)
  • Strong incident triage, root cause analysis, and documentation skills
  • Experience participating in on-call rotations and emergency response
  • Shift Timings: 12PM - 9PM IST
  • Bonus: SRE, cloud platform, or monitoring tool certifications
  • Bonus: ITIL Foundation certification
  • Bonus: Bachelor's degree in Computer Science, Information Technology, or related field
MoneyHash

DevOps Engineer [Senior] [Remote - EMEA]

MoneyHash
Seniorfull-time🇺🇸 United States
Posted: 32 days agoSource: moneyhash.recruitee.com
AnsibleAWSChefCloudDockerEC2FluxGrafanaJenkinsKubernetesPostgresPrometheus+6 more
Temporal Technologies

Senior Developer Support Engineer

Temporal Technologies
Seniorfull-time$117k–$147k / year🇺🇸 United States
Posted: 2 days agoSource: boards.greenhouse.io
AnsibleAWSAzureCloudDistributed SystemsDNSDockerGoGoogle Cloud PlatformGrafanaJavaKubernetes+4 more
Two Six Technologies

DevOps Engineer

Two Six Technologies
Mid · Seniorfull-time$120k–$170k / yearVirginia · 🇺🇸 United States
Posted: 14 days agoSource: boards.greenhouse.io
AnsibleAWSCloudDockerEC2ElasticSearchJenkinsKubernetesLogstashTerraformTypeScript
Fiserv

Site Reliability Engineer

Fiserv
Mid · Seniorfull-time🇺🇸 United States
Posted: 18 hours agoSource: fiserv.wd5.myworkdayjobs.com
AnsibleAWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusSplunkTerraform
Newfold Digital

Senior DevOps Engineer

Newfold Digital
Seniorfull-time🇧🇷 Brazil
Posted: 16 days agoSource: web.wd1.myworkdayjobs.com
AnsibleApacheAWSAzureCloudDNSDockerFirewallsGoogle Cloud PlatformGrafanaJenkinsKubernetes+7 more