Tech Stack
AWSAzureCloudDNSGrafanaKubernetesPrometheusTerraform
About the role
- Own the cloud platform strategy and roadmap across product, partner, and data domains, aligned to business priorities and cost targets.
- Build and operate secure, scalable multi‑account cloud foundations with strong guardrails, identity boundaries, and least‑privilege access.
- Define and maintain golden paths for developers: repo bootstraps, CI/CD templates, IaC modules, and service archetypes.
- Lead and evolve CI/CD: trunk‑based development, pipelines, quality gates, artifact management, and progressive delivery.
- Establish robust platform SRE practices: SLIs/SLOs, error budgets, incident response, post‑incident reviews, and capacity planning.
- Implement platform security controls: secrets management, image scanning, SBOMs, policy‑as‑code, vulnerability management, and compliance reporting (e.g., ISO 27001 alignment).
- Provide first‑class observability: logging, metrics, traces, dashboards, and alerting with actionable runbooks.
- Standardise infrastructure as code (Terraform / Helm) and GitOps for environments from dev to production.
- Partner with Data and Analytics to run reliable, governed data platforms, including compute, storage, orchestration, and secure data access.
- Support partner/client projects with repeatable patterns for secure landing zones, connectivity, and deployment workflows.
- Manage cloud cost visibility and optimisation: budgets, tagging, chargeback/showback, rightsizing, and lifecycle policies.
- Select and run core platform tooling: secrets, build, artifact, container registry, runtime, observability, and incident tooling.
- Lead a small, high‑impact DevOps/SRE team. Hire, mentor, and establish on‑call, ways of working, and career growth.
- Collaborate with Engineering, Data, Security, and IT to align platform standards, SSO/identity, and endpoint/device policies.
- Vendor and contract management for cloud and platform services, holding suppliers to SLAs and security requirements.
Requirements
- Proven experience leading DevOps/SRE or platform engineering for cloud‑native products and data workloads.
- Deep expertise in:
- Cloud & Networking: AWS or Azure core services, multi‑account design, VPC/VNet, private networking, load balancing, DNS, CDN.
- Containers & Runtime: Kubernetes or ECS/AKS, serverless, service mesh, autoscaling, blue/green and canary deploys.
- CI/CD & Automation: GitHub Actions, GitLab CI, Azure DevOps, or similar. Trunk‑based development and pipeline hardening.
- Infrastructure as Code: Terraform or Pulumi. Modularisation, environments, and policy‑as‑code (e.g., OPA/Conftest).
- Security & Compliance: Secrets management, IAM/Entra, key management, image scanning, SBOM, ISO 27001 practices.
- Observability & Reliability: Prometheus/Grafana, CloudWatch, OpenTelemetry, log aggregation, alerting, SLOs, incident response.
- Data Platform Basics: Storage tiers, orchestration, data pipelines, access controls, and cost governance.
- Track record creating developer golden paths and reducing lead time for changes while improving reliability.
- Strong stakeholder skills and the ability to balance speed, security, and cost.
- Excellent communication and leadership, with experience building and mentoring platform teams.
- Relevant certifications or equivalent experience are beneficial (e.g., AWS/Azure, CKAD/CKA, CISSP, ITIL for service practices).
- Health insurance
- Retirement plans
- Paid time off
- Flexible working arrangements
- Professional development
- Equipment allowances
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
cloud platform strategymulti-account cloud foundationsCI/CDinfrastructure as codeTerraformKubernetesGitHub Actionssecurity controlsobservabilitydata platforms
Soft skills
leadershipcommunicationstakeholder managementmentoringcollaborationincident responsecapacity planningproblem-solvingteam buildingbalancing speed and security
Certifications
AWS certificationAzure certificationCKADCKACISSPITILISO 27001OPAConftestDevOps certification