Create documentation that details the establishment of the SRE function within the platform, supported by procedures that outline the guidelines to be followed through the incorporation of existing documentation.
Provide a framework in which to operate the cloud systems under
Lead the transition to cloud infrastructure and improve observability across systems
Identify and eliminate toil through automation
Manage incidents and post-mortems to improve service reliability
Mentor engineers and support team development
Collaborate with Product Owners to balance operational and development priorities
Requirements
Proven experience as a Site Reliability Engineer in cloud environments (GCP or AWS)
Understanding of SRE principles including SLIs, SLOs, error budgets, and toil reduction.
Strong scripting and infrastructure-as-code (IaaC) skills (Terraform, Harness, GitHub)
Demonstrable experience in the Agile ways of working that focuses on delivering customer value and applying the Agile mindset; familiarity with tools like Jira
Ability to lead incident response and drive service improvements
Strong collaboration and mentoring skills
Benefits
A generous pension contribution of up to 15%
An annual performance-related bonus
Share schemes including free shares
Benefits you can adapt to your lifestyle, such as discounted shopping
30 days’ holiday, with bank holidays on top
A range of wellbeing initiatives and generous parental leave policies
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Site Reliability Engineercloud environmentsGCPAWSSRE principlesSLIsSLOserror budgetstoil reductioninfrastructure-as-code