
Site Reliability Engineer, Level 3
Granicus
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Tech Stack
About the role
- Provide production support on a shift according to the team on-call roster
- Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
- Continuously monitor the health and performance of our services, systems, and infrastructure
- Respond to alerts and incidents promptly to ensure high availability
- Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
- Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
- Participate in designing and implementing system improvements to enhance reliability, scalability, and performance
- Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
- Create and maintain documentation for processes, procedures, and troubleshooting guides
- Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
- Implement and adhere to security best practices to protect our systems and data.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
- 5+ years of experience in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems
- Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
- Experience with scripting languages such as Python, Bash, or Ruby
- Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
- Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
- Advanced knowledge of monitoring and logging tools (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines
- Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently
- Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders
- Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives
- Relevant certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or similar
- In-depth understanding of containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)
- Experience with database management (SQL, NoSQL), load balancing, and distributed systems.
Benefits
- Employee Resource Groups to encourage diverse voices
- Coffee with Mark sessions – Our employees get to interact with our CEO on important issues regarding mental health to work-life balance and current affairs
- Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
site reliability engineeringsystem administrationLinuxUnixcloud servicesPythonBashRubyGoJava
Soft Skills
analytical skillsproblem-solving skillscommunication skillsleadershipmentoringproject managementcross-functional collaboration
Certifications
AWS Certified DevOps EngineerGoogle Cloud Professional DevOps Engineer