Red Hat

Senior Site Reliability Engineer

Red Hat

full-time

Posted on:

Location: Colorado • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $111,260 - $183,580 per year

Job Level

Senior

Tech Stack

AnsibleAWSAzureChefCloudDistributed SystemsDNSDockerGoGoogle Cloud PlatformJavaKubernetesLinuxOpenShiftOpen SourcePrometheusPuppetPythonTCP/IPUnix

About the role

  • Develop, scale, and operate Red Hat OpenShift managed cloud services
  • Contribute code to increase the scalability and reliability of the service
  • Contribute software tests and participate in peer review to increase code quality
  • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration
  • Participate in a regular on-call schedule, including occasional paid weekends and holidays
  • Practice sustainable incident response and blameless postmortems
  • Resolve customer issues escalated from the Red Hat Global Support team
  • Work within a small agile team to develop and improve SRE software, support peers, plan and self-improve
  • Collaborate with cross-functional teams to identify opportunities for AI integration within the software development lifecycle
  • Enable customer self-service, make monitoring more sustainable, and eliminate work through automation

Requirements

  • A bachelor's degree in Computer Science or a related technical field required (hands-on experience may be considered in lieu of degree)
  • Some experience programming in at least one of: Python, Golang, Java, C, C++, or another object-oriented language
  • Experience working with public clouds such as AWS, GCP, or Azure
  • Ability to collaboratively troubleshoot and solve problems in a team setting
  • Experience troubleshooting an as-a-service offering (SaaS, PaaS, etc.) and working with complex distributed systems is a plus
  • Direct experience with Kubernetes or OpenShift is a plus
  • Demonstrated ability to debug, optimize code and automate routine tasks
  • Basic understanding of Unix/Linux operating systems
  • 5+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud provider such as AWS, GCE, or Azure
  • 3+ years of experience with enterprise systems monitoring; knowledge of Prometheus is a plus
  • 3+ years of experience with enterprise configuration management software like Ansible, Puppet, or Chef
  • 2+ years of experience programming with at least one object-oriented language; Golang, Java, or Python are preferred
  • 2+ years of experience delivering a hosted service
  • Demonstrated ability to quickly and accurately troubleshoot system issues
  • Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
  • Solid communications skills and experience working directly with and presenting to customers
  • 1+ year(s) of experience with Kubernetes is a plus
  • 1+ year(s) of experience with docker-based containers is a plus
  • Willingness to participate in ON CALL to support West Coast USA hours