Tech Stack
AnsibleAWSCloudDockerGoogle Cloud PlatformKubernetesPrometheusPythonSplunkTerraform
About the role
- Design, implement, and maintain monitoring solutions across infrastructure and applications
- Configure alerting thresholds and define/track SLOs and error budgets for critical services
- Create and maintain dashboards providing real-time visibility into system health
- Participate in on-call rotation, lead incident response, and conduct post-incident reviews
- Document incidents, resolutions, and lessons learned; refine incident response procedures
- Develop automation, scripts, and self-healing systems to remediate common issues
- Integrate monitoring tools with operational systems and CI/CD pipelines
- Collaborate with development, infrastructure, and security teams and provide monitoring best practices
- Analyze monitoring data, implement metrics, and contribute to monitoring strategy improvements
Requirements
- 5+ years of experience with enterprise monitoring tools (Prometheus, LogicMonitor, Datadog, ThousandEyes, Zscaler Digital Experience (ZDX))
- Strong proficiency in scripting languages (Python, Bash, PowerShell) for automation
- Experience with log management platforms (ELK stack, Splunk, LogScale)
- Working knowledge of cloud services monitoring (AWS CloudWatch, GCP)
- Experience with application performance monitoring (APM), digital experience monitoring (DEM) and infrastructure monitoring
- Knowledge of SRE principles, SLOs, error budgets, and incident management
- Experience with automated alerting, remediation workflows, and CI/CD pipeline monitoring
- Familiarity with Infrastructure as Code (Terraform, Ansible) and containerization (Docker, Kubernetes)
- Strong incident triage, root cause analysis, and documentation skills
- Experience participating in on-call rotations and emergency response
- Shift Timings: 12PM - 9PM IST
- Bonus: SRE, cloud platform, or monitoring tool certifications
- Bonus: ITIL Foundation certification
- Bonus: Bachelor's degree in Computer Science, Information Technology, or related field