Site Reliability Engineer

LifeStance Health

full-time

Posted on: 9/3/2025

Origin: • 🇺🇸 United States • Arizona

✨ AI Apply

💰 $140,000 - $160,000 per year

SeniorLead

AWSCloudDistributed SystemsGoGoogle Cloud PlatformJavaScriptKubernetesMicroservicesPrometheusPythonTerraformVault

About the role

At LifeStance Health, we’re building the future of mental healthcare—and we need a Senior Site Reliability Engineer to architect and safeguard the mission-critical infrastructure behind our national digital health platform.
This is not just a support role. You’ll be a principal engineer shaping how our platform scales securely and reliably to serve millions.
Define service-level objectives (SLOs), lead reliability reviews, champion incident response, and ensure production readiness is embedded in our engineering DNA.
Architect scalable, secure infrastructure on AWS using EKS, Lambda, and edge networking strategies.
Drive incident response operations, lead postmortems, and institutionalize RCA learnings.
Automate everything: provisioning, security controls, deployments, chaos, DR drills—using Terraform, Helm, GitHub Actions.
Build and maintain observability stack (Datadog, Prometheus, ELK, OpenTelemetry); deliver actionable dashboards and alerts.
Implement and maintain zero-trust IAM and secrets management frameworks (Vault, AWS Secrets Manager).
Lead platform reliability reviews and collaborate with engineers, security, and compliance teams to harden architecture.
Mentor engineers, lead production reviews, and evolve the reliability mindset company-wide.

10+ years in DevOps/SRE/Platform Engineering roles; at least 4+ years architecting for distributed cloud-native systems at scale.
Expert in AWS core services (EKS, VPC, RDS, Route 53, IAM, Lambda); Terraform-first mindset.
Proven track record in establishing SLIs/SLOs, building error budgets, and aligning them with business velocity.
Deep expertise in Kubernetes (EKS), Helm, service meshes (Istio/Linkerd), and microservices orchestration.
Strong software engineering fundamentals in Python, Go, or similar.
Hands-on experience with modern observability platforms and real-time monitoring solutions.
Technical leadership in incident response, risk management, and operational resilience in regulated industries.
Ability to translate system architecture into platform strategy and influence executive stakeholders.
Preferred: Certifications: AWS DevOps Pro, GCP SRE/Architect, Certified Kubernetes Administrator (CKA).
Preferred: Experience with hybrid/multi-cloud systems and edge deployments.
Preferred: Experience deploying and securing healthcare platforms (HIPAA, FHIR, HL7).
Preferred: Published thought leadership or open-source contributions in reliability, observability, or infrastructure automation.
Additional Requirements: Must be legally authorized to be employed in the United States.