FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
Heidi HealthSenior Site Reliability Engineer supporting production systems for Heidi's AI Care Partner. Focused on incident response, system reliability, and day-to-day operations in a hybrid environment.
Tech Stack
Tools & technologiesAWSCloudKubernetesPrometheusPythonTerraform
About the role
Key responsibilities & impact- Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
- Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
- Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
- Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.
- Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.
- Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.
- Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.
- Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.
Requirements
What you’ll need- 3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.
- Experience supporting production systems and participating in on-call rotations.
- Comfortable debugging live systems under pressure.
- Experience operating cloud infrastructure (AWS preferred).
- Working knowledge of Kubernetes and containerised workloads.
- Infrastructure as Code experience (Terraform or similar).
- Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).
- Scripting or automation experience (Python, Bash, or similar).
Benefits
Comp & perks- Your health, covered. Comprehensive private medical and dental cover through Bupa, plus 24/7 mental health, coaching and wellbeing support through Sonder and a £100/month Healthy Heidi’s stipend.
- Global parental leave. 26 weeks paid for primary carers and 18 weeks for secondary carers, subject to eligibility.
- Fertility support. £7,000 one-off payment, eligibility applies.
- Learning & development. £700 per year for courses, books, memberships, conferences and more.
- Home office budget. £500 one-off to set up a workspace you actually want to work in.
- Recharge days after major milestones and busy periods so you can reset and come back strong.
- Work from anywhere for up to 4 weeks per year, wherever the world takes you.
- Clinical leave. 10 days per year for eligible clinical roles to maintain accreditation and requirements.
- Flexibility that works. A hybrid environment, with 3 days in the office.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesAWSTerraformPythonBashmonitoring toolsalerting toolsautomationdebuggingproduction systems
Soft Skills
communicationcollaborationproblem-solvingincident responseleadershipprocess improvementoperational readinessblameless post-mortemsreliability expectationsservice ownership