Senior Site Reliability Engineer

Virta Health

full-time

Posted on: 8/25/2025

Location: 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $167,249 - $216,000 per year

Job Level

Senior

Tech Stack

GoPythonTerraform

About the role

Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic. Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or prediabetes. As an SRE on the Infrastructure team at Virta, you will be building the foundation that will help our company move as fast as possible while meeting security and compliance requirements. Key projects for the team over the next two quarters include: Implement an AI‑driven observability and metrics platform that automatically detects anomalies and highlights SLO risks, enabling product teams to make data‑driven decisions. Enhancing system observability, reliability, and efficiency using off-the-shelf technology combined with internal tools developed in Python and Go to increase transparency and visibility into our systems as well as centralizing data. Building out more products for our Product Development teams like observability (SLOs, alerting, dashboards) modules to allow them to spin up an MVP out of the box. Improving incident readiness with better tooling and the right hygiene practices such as game days. Engage with feature development teams in toil reduction exercises, capacity planning, load testing, SLO process, and other best practices — partnering with product teams to replace manual capacity planning with predictive/AI-driven scaling models and to codify self-healing runbooks that minimize toil Improving the velocity and quality of our developer platform and tooling General AI fluency desired: comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements We are in the midst of re-defining our incident response tooling/strategy, improving test tooling, and developing a strategy to ensure all applications are performant and available. Joining Virta would make you one of the key people defining and driving the future vision of what reliability and observability should look like.

Requirements

Highly proficient in shipping backend code in high-quality production environments, with strong hands-on coding and automation expertise, and a deep understanding of reliability and production readiness practices Hands-on expertise with automation and infrastructure-as-code (Terraform modules preferred), ideally with experience in observability Experience designing and implementing highly observable, scalable systems — with a proven track record configuring AIOps / ML-based monitoring platforms — that support large numbers of users while reducing operational burden Applied and general AI fluency: ability to leverage AI/ML-assisted observability (e.g., anomaly detection, error-budget burn prediction) while also being comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements Growth mindset and craftsmanship: ability to coach, mentor, and evangelize AI-first insights while continually improving engineering practices and following best practices

Senior Site Reliability Engineer

Salary

Job Level

Tech Stack

About the role

Requirements

Similar jobs on JobTailor

Senior Site Reliability Engineer – AWS, AI/ML, APM

Senior DevOps Engineer

Staff Software Engineer, DevOps/SRE

DevOps Manager

Senior Infrastructure – DevOps Engineer