Site Reliability Engineer – Mid-Senior, Operations-Focused

Heidi Health

full-time

Posted on: 2/9/2026

Location Type: Hybrid

Location: London • United Kingdom

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

AWS Cloud Kubernetes Prometheus Python Terraform

About the role

Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.
Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.
Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.
Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.
Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.

Requirements

3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.
Experience supporting production systems and participating in on-call rotations.
Comfortable debugging live systems under pressure.
Experience operating cloud infrastructure (AWS preferred).
Working knowledge of Kubernetes and containerised workloads.
Infrastructure as Code experience (Terraform or similar).
Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).
Scripting or automation experience (Python, Bash, or similar).

Benefits

Real product momentum. We’re not trying to generate interest, we’re channeling it.
Equity from day one. When Heidi wins, you win. You’ll share directly in the success you help create.
Unmatched impact. Play a pivotal role in defining and scaling customer success at a critical growth moment - all while working on a product that delivers tangible value to clinicians and patients every day.
Work alongside world-class talent. Join a team of operators and builders who’ve scaled unicorns.
Global reach. Help shape our international expansion as we bring Heidi to key international markets.
Growth and balance. Enjoy a personal development budget, work from anywhere for a month, dedicated wellness days, and your birthday off to recharge.
Flexibility that works. A hybrid environment, with 3 days in the office.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesAWSTerraformPythonBashmonitoring toolsalerting toolsautomationdebuggingproduction systems

Soft Skills

communicationcollaborationproblem-solvingincident responseleadershipoperational readinessprocess improvementreliability focusblameless post-mortemsownership