
Lead Cloud Site Reliability Engineer
Lloyds Banking Group
full-time
Posted on:
Location Type: Hybrid
Location: Manchester • United Kingdom
Visit company websiteExplore more
Salary
💰 £92,701 - £109,060 per year
Job Level
About the role
- Lead, coach and develop a high‑performing SRE team, fostering autonomy, inclusion and continuous improvement.
- Partner with Product Owners and Engineering Leads to embed reliability into roadmaps, backlogs and delivery decisions.
- Apply SRE principles (SLIs, SLOs, error budgets) to ensure our services remain highly reliable, performant and scalable.
- Drive improvements in observability—across metrics, logs, traces and events—ensuring services are observable by design.
- Use Dynatrace as the primary observability platform for significant dashboards and customer‑centric alerting.
- Own Infrastructure‑as‑Code and CI/CD‑based environments, implementing enhancements and responding to operational change.
- Lead coordination of incident response and root cause analysis, supporting teams through major incidents, post‑incident reviews and prevention of recurrence.
- Collaborate with multi‑disciplinary engineering teams to remove technical impediments, reduce toil and improve service operability.
- Contribute hands‑on engineering where needed, validating technical decisions and guiding best practice.
- Bring an approach of curiosity, experimentation, and first‑principles thinking to evolve our engineering culture.
Requirements
- Proven experience applying SRE practices within Azure, GCP, or both.
- Strong understanding of SLIs, SLOs, error budgets, and how to use these to guide product and engineering decisions.
- Experience ensuring reliability of production services, including availability, performance and recoverability.
- Hands‑on or leadership experience in incident and problem management, focused on reducing MTTR and avoiding repeat issues.
- Background in software engineering or cloud engineering, with good understanding of modern SDLC practices.
- Practical experience with DevOps, CI/CD and automation to improve service reliability.
- Experience improving observability on complex, distributed systems.
- Ability to use data to influence prioritisation and balance reliability with feature delivery.
- Collaboration and communication skills, working effectively with product, engineering and platform teams.
- Experience mentoring engineers and promoting inclusive, supportive team culture.
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SRE principlesSLIsSLOserror budgetsInfrastructure-as-CodeCI/CDDevOpsobservabilitycloud engineeringsoftware engineering
Soft Skills
leadershipcoachingcollaborationcommunicationcuriosityexperimentationproblem managementmentoringinclusivitycontinuous improvement