Granicus

Site Reliability Engineer

Granicus

full-time

Posted on:

Location Type: Hybrid

Location: BangaloreIndia

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Provide production support on a shift according to the team on-call roster.
  • Work on SRE projects and Tech support escalated and internal engineering/implementation team raised tickets.
  • Monitor and Maintain Systems.
  • Respond to alerts and incidents promptly to ensure high availability.
  • Implement effective alerting & notifications, minimizing false alerts.
  • Create and manage effective SRE Dashboards to report Key business metrics, SLAs, SLOs, SLIs & error budgets.
  • Proactively & effectively evaluates capacity planning to handle growth - scalability & traffic load.
  • Actively participate in troubleshooting and resolving incidents, performing root cause analysis, Incident post mortems and implementing long-term fixes to prevent recurrence.
  • Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention.
  • Partner closely with DevOps and Software Engineering teams to enhance system reliability.
  • Create and maintain documentation for technology, architecture, processes, procedures, and troubleshooting guides.

Requirements

  • Atleast 8+ years of relavant experience in site reliability engineering with a proven track record of managing complex, medium to large scale high-availability systems.
  • Expertise in Monitoring/Observability - Elastic & Cloud watch/Azure Monitor
  • Expertise in Linux/Windows OS & networking
  • Advanced knowledge of Cloud services (AWS & Azure)
  • Advanced knowledge of Container Technologies - Dockers & Kubernetes (K8s)
  • Proficiency on Database/Queries - MSSQL,Postgres,Mongodb,Mysql
  • Proficiency in Scripting - Python/Powershell / Bash
  • Working experience on CI/CD Tools - Gitlab/Azure Devops or similar tools
  • Working experience on IaC Tools -Terraform/Ansible
  • Working experience on Configuration management -Chef
  • Working experience on Incident response - Pagerduty, Jira
  • Relevant certifications such as Elastic Certified Observability Engineer, AWS Certified Solutions Architect, Certified Kubernetes Administrator, or those with Equivalent hands-on experience is highly valued.
Benefits
  • Employee Resource Groups to encourage diverse voices
  • Coffee with Mark sessions
  • Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
site reliability engineeringmonitoringobservabilityLinuxWindowscloud servicescontainer technologiesdatabase queriesscriptingCI/CD tools
Soft skills
troubleshootingroot cause analysiscapacity planningincident responsedocumentation
Certifications
Elastic Certified Observability EngineerAWS Certified Solutions ArchitectCertified Kubernetes Administrator