
Site Reliability Engineer – Mid-Senior, Operations-Focused
Heidi Health
full-time
Posted on:
Location Type: Hybrid
Location: London • United Kingdom
Visit company websiteExplore more
Job Level
About the role
- Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
- Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
- Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
- Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.
- Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.
- Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.
- Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.
- Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.
Requirements
- 3–6+ years in SRE, DevOps, Platform, or operations-heavy engineering roles.
- Experience supporting production systems and participating in on-call rotations.
- Comfortable debugging live systems under pressure.
- Experience operating cloud infrastructure (AWS preferred).
- Working knowledge of Kubernetes and containerised workloads.
- Infrastructure as Code experience (Terraform or similar).
- Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).
- Scripting or automation experience (Python, Bash, or similar).
Benefits
- Real product momentum. We’re not trying to generate interest, we’re channeling it.
- Equity from day one. When Heidi wins, you win. You’ll share directly in the success you help create.
- Unmatched impact. Play a pivotal role in defining and scaling customer success at a critical growth moment - all while working on a product that delivers tangible value to clinicians and patients every day.
- Work alongside world-class talent. Join a team of operators and builders who’ve scaled unicorns.
- Global reach. Help shape our international expansion as we bring Heidi to key international markets.
- Growth and balance. Enjoy a personal development budget, work from anywhere for a month, dedicated wellness days, and your birthday off to recharge.
- Flexibility that works. A hybrid environment, with 3 days in the office.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesAWSTerraformPythonBashmonitoring toolsalerting toolsautomationdebuggingproduction systems
Soft Skills
communicationcollaborationproblem-solvingincident responseleadershipoperational readinessprocess improvementreliability focusblameless post-mortemsownership