Granicus

Site Reliability Engineer, Level 3

Granicus

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Provide production support on a shift according to the team on-call roster
  • Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
  • Continuously monitor the health and performance of our services, systems, and infrastructure
  • Respond to alerts and incidents promptly to ensure high availability
  • Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
  • Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
  • Participate in designing and implementing system improvements to enhance reliability, scalability, and performance
  • Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
  • Create and maintain documentation for processes, procedures, and troubleshooting guides
  • Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
  • Implement and adhere to security best practices to protect our systems and data.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
  • 5+ years of experience in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems
  • Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
  • Experience with scripting languages such as Python, Bash, or Ruby
  • Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
  • Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
  • Advanced knowledge of monitoring and logging tools (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines
  • Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently
  • Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders
  • Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives
  • Relevant certifications such as AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or similar
  • In-depth understanding of containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation)
  • Experience with database management (SQL, NoSQL), load balancing, and distributed systems.
Benefits
  • Employee Resource Groups to encourage diverse voices
  • Coffee with Mark sessions – Our employees get to interact with our CEO on important issues regarding mental health to work-life balance and current affairs
  • Microsoft Teams communities focused on wellness, art, furbabies, family, parenting, and more.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
site reliability engineeringsystem administrationLinuxUnixcloud servicesPythonBashRubyGoJava
Soft Skills
analytical skillsproblem-solving skillscommunication skillsleadershipmentoringproject managementcross-functional collaboration
Certifications
AWS Certified DevOps EngineerGoogle Cloud Professional DevOps Engineer