Senior Site Reliability Engineer

Red Hat

full-time

Posted on: 9/18/2025

Location: Colorado • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $111,260 - $183,580 per year

Job Level

Senior

Tech Stack

AnsibleAWSAzureChefCloudDistributed SystemsDNSDockerGoGoogle Cloud PlatformJavaKubernetesLinuxOpenShiftOpen SourcePrometheusPuppetPythonTCP/IPUnix

About the role

Develop, scale, and operate Red Hat OpenShift managed cloud services
Contribute code to increase the scalability and reliability of the service
Contribute software tests and participate in peer review to increase code quality
Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration
Participate in a regular on-call schedule, including occasional paid weekends and holidays
Practice sustainable incident response and blameless postmortems
Resolve customer issues escalated from the Red Hat Global Support team
Work within a small agile team to develop and improve SRE software, support peers, plan and self-improve
Collaborate with cross-functional teams to identify opportunities for AI integration within the software development lifecycle
Enable customer self-service, make monitoring more sustainable, and eliminate work through automation

Requirements

A bachelor's degree in Computer Science or a related technical field required (hands-on experience may be considered in lieu of degree)
Some experience programming in at least one of: Python, Golang, Java, C, C++, or another object-oriented language
Experience working with public clouds such as AWS, GCP, or Azure
Ability to collaboratively troubleshoot and solve problems in a team setting
Experience troubleshooting an as-a-service offering (SaaS, PaaS, etc.) and working with complex distributed systems is a plus
Direct experience with Kubernetes or OpenShift is a plus
Demonstrated ability to debug, optimize code and automate routine tasks
Basic understanding of Unix/Linux operating systems
5+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider such as AWS, GCE, or Azure
3+ years of experience with enterprise systems monitoring; knowledge of Prometheus is a plus
3+ years of experience with enterprise configuration management software like Ansible, Puppet, or Chef
2+ years of experience programming with at least one object-oriented language; Golang, Java, or Python are preferred
2+ years of experience delivering a hosted service
Demonstrated ability to quickly and accurately troubleshoot system issues
Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
Solid communications skills and experience working directly with and presenting to customers
1+ year(s) of experience with Kubernetes is a plus
1+ year(s) of experience with docker-based containers is a plus
Willingness to participate in ON CALL to support West Coast USA hours