Principal Site Reliability Engineer

Red Hat

Site Reliability Engineer maintaining Red Hat’s hybrid cloud infrastructure while supporting software manufacturing services. Collaborating across teams to enhance service resilience and infrastructure health.

Posted 5/15/2026full-timePune • 🇺🇸 United StatesLeadWebsite

Tech Stack

Tools & technologies

AnsibleAWSCloudGrafanaJenkinsLinuxOpenShiftOpen SourcePrometheusTerraform

About the role

Key responsibilities & impact

Be part of a globally distributed team, offering 24x7 support through a service model that leverages different time zones to extend coverage with regular on-call rotations
Resolve service incidents by use of existing operating procedures, investigate outage causes and coordinate incident resolution across various service teams
Act as a leader and mentor to your less experienced colleagues, bring and drive continuous improvement ideas and help the team to benefit from technology evolution, such as AI tools utilization
Collaborate on incident retrospective reviews and corrective items implementation
Configure and maintain service infrastructure
Proactively identify and eliminate toil by automating manual, repetitive, and error-prone processes
Coordinate your actions with other Red Hat teams such as IT Platforms, Infrastructure, Storage and Network and ensure our services cloud deployment meets quality expectations
Implement monitoring, alerting and escalation plans in the event of an infrastructure outage or performance problem
Work with service owners to co-define and implement SLIs and SLOs for the services you will support, ensure those are met and execute remediation plans if they are not

Requirements

What you’ll need

Expert knowledge of OpenShift administration and application development
Linux administration expertise
Advanced knowledge of automation services: ArgoCD, Ansible or Terraform
Advanced knowledge of CI/CD platforms: Tekton and Pipelines as a code (optionally GitHub Actions or Jenkins)
Advanced knowledge and experience with monitoring platforms and technologies
General knowledge of AWS technologies
Ability to understand graphically represented concepts and architectures in documentation
Experience with creation of Standard Operating Procedures
Knowledge of open source monitoring technologies (Grafana, Prometheus, OpenTelemetry)
Excellent written and verbal communication skills in English

Benefits

Comp & perks

Inclusion at Red Hat
Equal Opportunity Workplace
Support individuals with disabilities

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

OpenShift administrationLinux administrationArgoCDAnsibleTerraformCI/CDTektonGitHub ActionsJenkinsGrafana

Soft Skills

leadershipmentoringcontinuous improvementcollaborationproblem-solvingcommunication