
Principal Site Reliability Engineer
Orgvue
full-time
Posted on:
Location Type: Hybrid
Location: London • United Kingdom
Visit company websiteExplore more
Job Level
About the role
- Define and enforce SLOs, SLIs, and error budgets across critical services
- Crafting and implementing a cloud infrastructure and tooling strategy
- Work across our Org to level up SRE practices
- Help implement robust observability metrics, logs & traces using our observability tool
- Guide the team in building automated, self-healing systems
- Own and evolve our incident response processes, including on-call practices and post-mortem culture
- Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
- Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
- Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
- Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
Requirements
- Demonstrable experience leading SRE transformations
- Deep hands-on expertise with **Kubernetes** (EKS preferred) in production environments
- Strong experience with **AWS core services** (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
- Expert in **Infrastructure as Code** using tools such as **Terraform**, with knowledge of GitOps workflows
- Strong background in observability: metrics, visualization, logging, and tracing
- Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
- Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews
Benefits
- Hybrid working - 1+ days a week in the London office
- Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
- Subsidised Gym Membership
- Private Medical Insurance (including Dental and Vision) and Life Assurance
- 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
- Summer Fridays (half-day Fridays for the months of July and August)
- Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
- Season ticket Loan
- Cycle to Work Scheme
- Annual Discretionary Bonus
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SLOsSLIserror budgetsInfrastructure as CodeTerraformKubernetesAWS core servicesobservabilityCI/CD pipelinesincident management
Soft Skills
mentoringcollaborationleadershipcommunicationoperational readiness