Delivering against GCP and SRE Public Cloud technology roadmaps
Collaborating with engineering teams to release and evolve enterprise-class solutions
Managing operations of critical banking services, including 24x7 coverage via on-call rota
Enhancing resiliency and reliability of customer-facing services
Troubleshooting and diagnosing issues with an engineering mindset
Building tooling to support service reliability and code quality
Working across multiple labs and signature projects in the Digital space
Leading Chaos Engineering initiatives to stress test services
Requirements
Strong understanding of SRE & DevOps, including experience of Infrastructure as Code and CI/CD pipelines using tools such as Azure DevOps, Terraform, or Jenkins.
Proficiency with Incident Management software (ie ServiceNow)
Proficient in Dynatrace, Splunk, SRE GCP & Cloud Observability.
Demonstrable experience in using orchestrations tools such as Harness.
Knowledge of GCP and Azure cloud platforms.
Experience in identifying toil and design automated solutions to remove it.
Reliability & Performance Management: Design, implement and own the SLOs for critical platform services. Monitor system health, manage error budgets, and drive improvements in Mean Time to Failure (MTTF) and Mean Time to Recovery (MTTR).
Incident & Problem Management: Lead incident response and post-mortem analysis. Ensure root cause identification and long-term remediation strategies are implemented.
Platform Advocacy & Collaboration: Champion SRE principles across Segments & Propositions Lab. Collaborate with Lab Product Owners, Engineering Leads, and application teams to embed reliability into design and delivery.
Technical Leadership: Provide technical oversight across cloud infrastructure, CI/CD pipelines, observability tooling, and automation frameworks. Guide engineers in adopting scalable and resilient solutions.
Continuous Improvement: Identify and implement improvements in deployment, monitoring, and alerting processes. Drive automation to reduce toil and improve operational efficiency.
Governance & Compliance: Ensure platform services adhere to internal risk, security, and compliance standards. Support audit and regulatory reporting requirements.
Benefits
A generous pension contribution of up to 15%
An annual performance-related bonus
Share schemes including free shares
Benefits you can adapt to your lifestyle, such as discounted shopping
30 days’ holiday, with bank holidays on top
A range of wellbeing initiatives and generous parental leave policies
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
GCPSREDevOpsInfrastructure as CodeCI/CDTerraformAzure DevOpsDynatraceSplunkCloud Observability