Red Hat

Associate Manager, Site Reliability Engineering

Red Hat

full-time

Posted on:

Location Type: Remote

Location: Australia

Visit company website

Explore more

AI Apply
Apply

About the role

  • Lead and grow a team of SREs maintaining the overall health of OpenShift hosted properties
  • Own the health, reliability and availability of OpenShift hosted properties
  • Provide coaching, oversight and escalation support to the regional team of SREs
  • Ensure that incidents are managed and resolved quickly, and that retrospectives and root-cause analysis is completed within expected timelines
  • Oversee the creation and maintenance of knowledge article and standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in the environment
  • Manage regional shift schedules, ensuring 24x7 resource availability
  • Participate in sprint planning and release cycles of SRE tooling
  • Schedule maintenance windows, considering customer and SRE resource requirements
  • Coordinate with teams across the organization to reduce operational friction and automate wherever possible
  • Resolve customer issues in cooperation with Red Hat's global customer support team
  • Identify and advocate for resources (e.g., training, licenses for new tools, dedicated time for exploration) to support the team's ongoing AI literacy and adoption.
  • Ensure your team understands and applies guidelines for the ethical use of AI within the team, addressing concerns such as data privacy, bias mitigation, intellectual property, and responsible disclosure.
  • Foster a safe environment for experimentation and learning with AI technologies by supporting projects and experiments that encourage efficiency and simplicity – this could include: automating repetitive tasks, analyzing code metrics, or improving development processes; support the team to quickly test and implement as well as recover through failures.

Requirements

  • 1+ years experience managing engineering teams
  • Must be comfortable managing distributed, remote staff
  • Ability to understand and discuss deep technical issues with engineers
  • Demonstrated experience with contemporary project management methodologies such as Agile, kanban and / or scrum
  • 1+ years of experience with cloud providers such as Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure
  • 1+ year(s) of experience with Kubernetes is a plus
  • 1+ year(s) of experience with docker-based containers is a plus
Benefits
  • Flexible working hours
  • Professional development opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
OpenShiftKubernetesDockerAI literacyroot-cause analysisAgileKanbanScrumcloud computingincident management
Soft Skills
leadershipcoachingcommunicationproblem-solvingteam managementcollaborationadaptabilitycritical thinkingmentoringorganizational skills