Site Reliability Engineer, Level 3

Granicus

. Provide production support on a shift according to the team on-call roster .

Posted 4/7/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSAzureChefCloudDistributed SystemsDockerGoGrafanaJavaKubernetesLinuxNoSQLPrometheusPuppetPythonRubySplunkSQLTerraformUnix

About the role

Key responsibilities & impact

Provide production support on a shift according to the team on-call roster
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
Continuously monitor the health and performance of our services, systems, and infrastructure
Respond to alerts and incidents promptly to ensure high availability
Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
Participate in designing and implementing system improvements to enhance reliability, scalability, and performance
Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
Create and maintain documentation for processes, procedures, and troubleshooting guides
Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
Implement and adhere to security best practices to protect our systems and data.

Requirements

What you’ll need

Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
5+ years of experience in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems
Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
Experience with scripting languages such as Python, Bash, or Ruby
Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
Advanced knowledge of monitoring and logging tools (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines
Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently
Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders
Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives
Relevant certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or similar
In-depth understanding of containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)
Experience with database management (SQL, NoSQL), load balancing, and distributed systems.

Benefits

Comp & perks

Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions – Our employees get to interact with our CEO on important issues regarding mental health to work-life balance and current affairs
Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

site reliability engineeringsystem administrationLinuxUnixcloud servicesPythonBashRubyGoJava

Soft Skills

analytical skillsproblem-solving skillscommunication skillsleadershipmentoringproject managementcross-functional collaboration

Certifications

AWS Certified DevOps EngineerGoogle Cloud Professional DevOps Engineer