
Senior Site Reliability Engineer
Red Hat
full-time
Posted on:
Location: Colorado • 🇺🇸 United States
Visit company websiteSalary
💰 $111,260 - $183,580 per year
Job Level
Senior
Tech Stack
AnsibleAWSAzureChefCloudDistributed SystemsDNSDockerGoGoogle Cloud PlatformJavaKubernetesLinuxOpenShiftOpen SourcePrometheusPuppetPythonTCP/IPUnix
About the role
- Develop, scale, and operate Red Hat OpenShift managed cloud services
- Contribute code to increase the scalability and reliability of the service
- Contribute software tests and participate in peer review to increase code quality
- Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration
- Participate in a regular on-call schedule, including occasional paid weekends and holidays
- Practice sustainable incident response and blameless postmortems
- Resolve customer issues escalated from the Red Hat Global Support team
- Work within a small agile team to develop and improve SRE software, support peers, plan and self-improve
- Collaborate with cross-functional teams to identify opportunities for AI integration within the software development lifecycle
- Enable customer self-service, make monitoring more sustainable, and eliminate work through automation
Requirements
- A bachelor's degree in Computer Science or a related technical field required (hands-on experience may be considered in lieu of degree)
- Some experience programming in at least one of: Python, Golang, Java, C, C++, or another object-oriented language
- Experience working with public clouds such as AWS, GCP, or Azure
- Ability to collaboratively troubleshoot and solve problems in a team setting
- Experience troubleshooting an as-a-service offering (SaaS, PaaS, etc.) and working with complex distributed systems is a plus
- Direct experience with Kubernetes or OpenShift is a plus
- Demonstrated ability to debug, optimize code and automate routine tasks
- Basic understanding of Unix/Linux operating systems
- 5+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider such as AWS, GCE, or Azure
- 3+ years of experience with enterprise systems monitoring; knowledge of Prometheus is a plus
- 3+ years of experience with enterprise configuration management software like Ansible, Puppet, or Chef
- 2+ years of experience programming with at least one object-oriented language; Golang, Java, or Python are preferred
- 2+ years of experience delivering a hosted service
- Demonstrated ability to quickly and accurately troubleshoot system issues
- Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
- Solid communications skills and experience working directly with and presenting to customers
- 1+ year(s) of experience with Kubernetes is a plus
- 1+ year(s) of experience with docker-based containers is a plus
- Willingness to participate in ON CALL to support West Coast USA hours