Tech Stack
AWSCloudDistributed SystemsGoogle Cloud PlatformKubernetes
About the role
- Define and execute the long-term platform, networking, and site reliability engineering vision and roadmap
- Build, lead, and mentor globally distributed platform engineering teams
- Instil a DevOps culture of ownership, automation, and continuous improvement
- Oversee cloud infrastructure (AWS & GCP), Kubernetes platforms, and global networking
- Drive automation-first practices including infrastructure-as-code, GitOps, and immutable deployments
- Establish and monitor SLOs, SLIs, and SLAs to ensure reliability and performance
- Lead incident management, blameless postmortems, and resilience engineering initiatives
- Own FinOps and cloud operations, ensuring platforms are cost-efficient and financially accountable
- Partner with Finance and Product to forecast cloud spend and optimise resource usage
- Own the end-to-end CI/CD ecosystem and provide self-service pipelines, test automation, and deployment tooling
- Embed security and compliance checks directly into the CI/CD process
- Champion a “paved road” developer experience that reduces friction and accelerates delivery
- Partner with Security Engineering to embed zero-trust, IAM, TLS, secrets management, and certificate lifecycle automation
- Ensure platform services and pipelines are compliant with PCI DSS, GDPR, and local regulations
- Lead global networking strategy to ensure secure, resilient, low-latency connectivity with automation and observability
- Collaborate with product and business leaders and represent the platform function with external partners, regulators, and customers
Requirements
- Minimum 10 years of proven experience leading platform, SRE, or DevOps functions at scale in mission-critical environments
- Deep expertise in cloud-native architectures (AWS, GCP)
- Experience with Kubernetes platforms and distributed systems
- Strong experience with CI/CD, infrastructure-as-code, GitOps, and immutable deployments
- Track record of driving automation and observability platforms
- Strong SRE mindset with experience applying SLIs, SLOs, error budgets, and resilience engineering
- Experience leading incident management, blameless postmortems, and resilience initiatives
- Experience with FinOps and cloud operations (cost-efficiency, right-sizing, forecasting cloud spend)
- Security-first approach with experience embedding compliance into automated pipelines
- Experience with zero-trust, modern IAM, end-to-end TLS, secrets management, and certificate lifecycle automation
- Knowledge of PCI DSS, GDPR, and local regulations compliance
- Strong leadership skills building high-performing, distributed engineering teams (global/remote-first preferred)
- Payments knowledge is a plus but not essential
- Experience operating always-on, mission-critical systems