Lloyds Banking Group

Lead Cloud Site Reliability Engineer

Lloyds Banking Group

full-time

Posted on:

Location Type: Hybrid

Location: ManchesterUnited Kingdom

Visit company website

Explore more

AI Apply
Apply

Salary

💰 £92,701 - £109,060 per year

Job Level

About the role

  • Lead, coach and develop a high‑performing SRE team, fostering autonomy, inclusion and continuous improvement.
  • Partner with Product Owners and Engineering Leads to embed reliability into roadmaps, backlogs and delivery decisions.
  • Apply SRE principles (SLIs, SLOs, error budgets) to ensure our services remain highly reliable, performant and scalable.
  • Drive improvements in observability—across metrics, logs, traces and events—ensuring services are observable by design.
  • Use Dynatrace as the primary observability platform for significant dashboards and customer‑centric alerting.
  • Own Infrastructure‑as‑Code and CI/CD‑based environments, implementing enhancements and responding to operational change.
  • Lead coordination of incident response and root cause analysis, supporting teams through major incidents, post‑incident reviews and prevention of recurrence.
  • Collaborate with multi‑disciplinary engineering teams to remove technical impediments, reduce toil and improve service operability.
  • Contribute hands‑on engineering where needed, validating technical decisions and guiding best practice.
  • Bring an approach of curiosity, experimentation, and first‑principles thinking to evolve our engineering culture.

Requirements

  • Proven experience applying SRE practices within Azure, GCP, or both.
  • Strong understanding of SLIs, SLOs, error budgets, and how to use these to guide product and engineering decisions.
  • Experience ensuring reliability of production services, including availability, performance and recoverability.
  • Hands‑on or leadership experience in incident and problem management, focused on reducing MTTR and avoiding repeat issues.
  • Background in software engineering or cloud engineering, with good understanding of modern SDLC practices.
  • Practical experience with DevOps, CI/CD and automation to improve service reliability.
  • Experience improving observability on complex, distributed systems.
  • Ability to use data to influence prioritisation and balance reliability with feature delivery.
  • Collaboration and communication skills, working effectively with product, engineering and platform teams.
  • Experience mentoring engineers and promoting inclusive, supportive team culture.
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SRE principlesSLIsSLOserror budgetsInfrastructure-as-CodeCI/CDDevOpsobservabilitycloud engineeringsoftware engineering
Soft Skills
leadershipcoachingcollaborationcommunicationcuriosityexperimentationproblem managementmentoringinclusivitycontinuous improvement