Salary
💰 $167,249 - $216,000 per year
Tech Stack
GoPythonTerraform
About the role
- Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic. Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or prediabetes.
As an SRE on the Infrastructure team at Virta, you will be building the foundation that will help our company move as fast as possible while meeting security and compliance requirements.
Key projects for the team over the next two quarters include: Implement an AI‑driven observability and metrics platform that automatically detects anomalies and highlights SLO risks, enabling product teams to make data‑driven decisions.
Enhancing system observability, reliability, and efficiency using off-the-shelf technology combined with internal tools developed in Python and Go to increase transparency and visibility into our systems as well as centralizing data.
Building out more products for our Product Development teams like observability (SLOs, alerting, dashboards) modules to allow them to spin up an MVP out of the box.
Improving incident readiness with better tooling and the right hygiene practices such as game days.
Engage with feature development teams in toil reduction exercises, capacity planning, load testing, SLO process, and other best practices — partnering with product teams to replace manual capacity planning with predictive/AI-driven scaling models and to codify self-healing runbooks that minimize toil
Improving the velocity and quality of our developer platform and tooling
General AI fluency desired: comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements
We are in the midst of re-defining our incident response tooling/strategy, improving test tooling, and developing a strategy to ensure all applications are performant and available. Joining Virta would make you one of the key people defining and driving the future vision of what reliability and observability should look like.
Requirements
- Highly proficient in shipping backend code in high-quality production environments, with strong hands-on coding and automation expertise, and a deep understanding of reliability and production readiness practices
Hands-on expertise with automation and infrastructure-as-code (Terraform modules preferred), ideally with experience in observability
Experience designing and implementing highly observable, scalable systems — with a proven track record configuring AIOps / ML-based monitoring platforms — that support large numbers of users while reducing operational burden
Applied and general AI fluency: ability to leverage AI/ML-assisted observability (e.g., anomaly detection, error-budget burn prediction) while also being comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements
Growth mindset and craftsmanship: ability to coach, mentor, and evangelize AI-first insights while continually improving engineering practices and following best practices