Senior Site Reliability Engineer

Drata

Senior Site Reliability Engineer focusing on reliability architecture and automation at Drata. Collaborating within SRE teams to enhance system reliability through proactive measures and automation practices.

Posted 4/28/2026full-timeSan Francisco • California • 🇺🇸 United StatesSenior💰 $166,900 - $225,900 per yearWebsite

Tech Stack

Tools & technologies

AWSCloudDockerKubernetesLinuxMySQLPythonTerraform

About the role

Key responsibilities & impact

You are the reliability expert for your aligned product team
Lead Production Readiness Reviews (PRRs) before new services launch
Partner with product engineering leads and staff engineers to define SLOs and SLIs for critical services
Participate in team planning and architecture reviews to provide proactive reliability guidance
Build reusable artifacts - SLO templates, observability checklists, alerting standards, reference dashboards
Build and maintain Datadog monitors, dashboards, and alert routing
Handle infrastructure requests: ECS task management, secret rotations, Terraform changes, capacity adjustments
Identify repeated manual work and convert it into self-service tooling or runbooks
Design and build shared platform infrastructure - reusable Terraform modules, standardized observability stacks, service templates
Participate in the on-call rotation and lead incident response when needed

Requirements

What you’ll need

6+ years of experience in Site Reliability Engineering, Cloud Engineering, or building and maintaining scalable, resilient services
Robust knowledge of cloud computing technologies: Terraform, Docker, Git, and Linux
Hands-on experience with Datadog for monitoring, alerting, dashboards, SLO tracking, and distributed tracing
Experience building software systems as a software engineer
Experience developing tooling and automation in Python and/or Bash
Experience with CI/CD pipeline automation, specifically GitHub Actions
Experience with disaster recovery practices and incident management
Strong understanding of observability concepts - monitoring, logging, distributed tracing, and metrics - and how to apply them to production systems
Experience with container orchestration and deployment technologies including AWS ECS Fargate and/or Kubernetes
Experience working with relational databases (MySQL proficiency is a plus)

Benefits

Comp & perks

Up to 100% employer-paid premiums for medical, dental, and vision coverage for employees and their dependents
Comprehensive wellness benefits and healthcare concierge services
401(k) plan
Company-paid life and disability insurance
Tax-advantaged spending accounts
Paid Parental Leave policy after six months of employment
Access to Kindbody fertility and family-building benefits
Paid time off and flexible vacation policy
Generous annual stipends for professional and personal development

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringCloud EngineeringTerraformDockerGitLinuxPythonBashCI/CDMySQL

Soft Skills

leadershipcollaborationproactive guidanceincident responseproblem-solving