Site Reliability Engineer – Level 3

Granicus

Site Reliability Engineer modernizing reliability engineering through observability and AI for Granicus. Improving service reliability and accelerating incident response in the Govtech industry.

Posted 7/3/2026full-time🇮🇳 IndiaMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSAzureChefCloudDistributed SystemsGoGrafanaJavaLinuxPrometheusPuppetPythonRubyTerraformUnix

About the role

Key responsibilities & impact

Granicus is seeking a Site Reliability Engineer (SRE 3) with strong AIOps capabilities to modernize reliability engineering through observability, automation, and AI-assisted operations
Improve service reliability, reduce operational toil, accelerate incident response, and help build scalable, resilient platforms supporting both traditional and AI/ML-powered workloads
Lead adoption of AI-first SRE practices across monitoring, incident response, and automation
Design and implement MCP-based integrations connecting systems like Elastic, Jira, and cloud platforms
Build and operationalize AI agents for SRE workflows (incident triage, RCA, alert summarization, runbooks)
Drive AIOps maturity: alert correlation, anomaly detection, assisted RCA
Develop predictive models for capacity, failures, and incidents
Provide production support on a shift according to the team on-call roster
Monitor and Maintain Systems: Proactively monitor the health and performance of our services, systems, and infrastructure. Respond to alerts and incidents promptly to ensure high availability
Ensure SREs are meeting or improving on established SLOs
Collaborate with cross teams to prevent reliability issues

Requirements

What you’ll need

6+ years of experience in site reliability engineering, system administration, or a similar role
Strong expertise in Linux/Unix, networking, distributed systems, and cloud platforms such as AWS, Azure, or Google Cloud
Experience with scripting languages such as Python, Bash, or Ruby and programming languages (Go, Java, C++)
Advanced knowledge of cloud, monitoring and Observability tools (Elastic, Prometheus, Grafana, Pingdom)
Experience with infrastructure automation, CI/CD pipelines and configuration tools such as Terraform, Ansible, Chef, or Puppet.
Experience integrating AIOps capabilities into observability stacks (metrics, logs, traces) for intelligent alerting, noise reduction, and root cause analysis.
Experience working with AI-assisted coding tools such as Cursor, GitHub Copilot, or similar developer copilots
Familiarity with Model Context Protocol (MCP) for integrating AI agents with enterprise systems (e.g., Jira, Elastic, cloud platforms)
Ability to design or leverage AI agents for SRE workflows (incident triage, RCA generation, alert summarization, runbook execution)
Experience building or integrating context-aware automation systems using MCP or similar frameworks
Certifications such as AWS Certified Solutions Architect, AWS Certified Machine Learning – Specialty, or Google Cloud Professional DevOps Engineer are a plus.

Benefits

Comp & perks

Flexible work arrangements
Professional development opportunities

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Linux/UnixNetworkingDistributed SystemsScripting Languages (Python, Bash, Ruby)Programming Languages (Go, Java, C++)AI-Assisted Coding Tools (Cursor, GitHub Copilot)Model Context Protocol (MCP)Incident TriageRoot Cause Analysis (RCA)Capacity Prediction

Certifications

AWS Certified Solutions ArchitectAWS Certified Machine Learning – SpecialtyGoogle Cloud Professional DevOps Engineer