LifeStance Health

Site Reliability Engineer

LifeStance Health

full-time

Posted on:

Origin:  • 🇺🇸 United States • Arizona

Visit company website
AI Apply
Manual Apply

Salary

💰 $140,000 - $160,000 per year

Job Level

SeniorLead

Tech Stack

AWSCloudDistributed SystemsGoGoogle Cloud PlatformJavaScriptKubernetesMicroservicesPrometheusPythonTerraformVault

About the role

  • At LifeStance Health, we’re building the future of mental healthcare—and we need a Senior Site Reliability Engineer to architect and safeguard the mission-critical infrastructure behind our national digital health platform.
  • This is not just a support role. You’ll be a principal engineer shaping how our platform scales securely and reliably to serve millions.
  • Define service-level objectives (SLOs), lead reliability reviews, champion incident response, and ensure production readiness is embedded in our engineering DNA.
  • Architect scalable, secure infrastructure on AWS using EKS, Lambda, and edge networking strategies.
  • Drive incident response operations, lead postmortems, and institutionalize RCA learnings.
  • Automate everything: provisioning, security controls, deployments, chaos, DR drills—using Terraform, Helm, GitHub Actions.
  • Build and maintain observability stack (Datadog, Prometheus, ELK, OpenTelemetry); deliver actionable dashboards and alerts.
  • Implement and maintain zero-trust IAM and secrets management frameworks (Vault, AWS Secrets Manager).
  • Lead platform reliability reviews and collaborate with engineers, security, and compliance teams to harden architecture.
  • Mentor engineers, lead production reviews, and evolve the reliability mindset company-wide.

Requirements

  • 10+ years in DevOps/SRE/Platform Engineering roles; at least 4+ years architecting for distributed cloud-native systems at scale.
  • Expert in AWS core services (EKS, VPC, RDS, Route 53, IAM, Lambda); Terraform-first mindset.
  • Proven track record in establishing SLIs/SLOs, building error budgets, and aligning them with business velocity.
  • Deep expertise in Kubernetes (EKS), Helm, service meshes (Istio/Linkerd), and microservices orchestration.
  • Strong software engineering fundamentals in Python, Go, or similar.
  • Hands-on experience with modern observability platforms and real-time monitoring solutions.
  • Technical leadership in incident response, risk management, and operational resilience in regulated industries.
  • Ability to translate system architecture into platform strategy and influence executive stakeholders.
  • Preferred: Certifications: AWS DevOps Pro, GCP SRE/Architect, Certified Kubernetes Administrator (CKA).
  • Preferred: Experience with hybrid/multi-cloud systems and edge deployments.
  • Preferred: Experience deploying and securing healthcare platforms (HIPAA, FHIR, HL7).
  • Preferred: Published thought leadership or open-source contributions in reliability, observability, or infrastructure automation.
  • Additional Requirements: Must be legally authorized to be employed in the United States.