FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
DrataSenior Site Reliability Engineer focusing on reliability architecture and automation at Drata. Collaborating within SRE teams to enhance system reliability through proactive measures and automation practices.
Posted 4/28/2026full-timeSan Francisco • California • 🇺🇸 United StatesSenior💰 $166,900 - $225,900 per yearWebsite
Tech Stack
Tools & technologiesAWSCloudDockerKubernetesLinuxMySQLPythonTerraform
About the role
Key responsibilities & impact- You are the reliability expert for your aligned product team
- Lead Production Readiness Reviews (PRRs) before new services launch
- Partner with product engineering leads and staff engineers to define SLOs and SLIs for critical services
- Participate in team planning and architecture reviews to provide proactive reliability guidance
- Build reusable artifacts - SLO templates, observability checklists, alerting standards, reference dashboards
- Build and maintain Datadog monitors, dashboards, and alert routing
- Handle infrastructure requests: ECS task management, secret rotations, Terraform changes, capacity adjustments
- Identify repeated manual work and convert it into self-service tooling or runbooks
- Design and build shared platform infrastructure - reusable Terraform modules, standardized observability stacks, service templates
- Participate in the on-call rotation and lead incident response when needed
Requirements
What you’ll need- 6+ years of experience in Site Reliability Engineering, Cloud Engineering, or building and maintaining scalable, resilient services
- Robust knowledge of cloud computing technologies: Terraform, Docker, Git, and Linux
- Hands-on experience with Datadog for monitoring, alerting, dashboards, SLO tracking, and distributed tracing
- Experience building software systems as a software engineer
- Experience developing tooling and automation in Python and/or Bash
- Experience with CI/CD pipeline automation, specifically GitHub Actions
- Experience with disaster recovery practices and incident management
- Strong understanding of observability concepts - monitoring, logging, distributed tracing, and metrics - and how to apply them to production systems
- Experience with container orchestration and deployment technologies including AWS ECS Fargate and/or Kubernetes
- Experience working with relational databases (MySQL proficiency is a plus)
Benefits
Comp & perks- Up to 100% employer-paid premiums for medical, dental, and vision coverage for employees and their dependents
- Comprehensive wellness benefits and healthcare concierge services
- 401(k) plan
- Company-paid life and disability insurance
- Tax-advantaged spending accounts
- Paid Parental Leave policy after six months of employment
- Access to Kindbody fertility and family-building benefits
- Paid time off and flexible vacation policy
- Generous annual stipends for professional and personal development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringCloud EngineeringTerraformDockerGitLinuxPythonBashCI/CDMySQL
Soft Skills
leadershipcollaborationproactive guidanceincident responseproblem-solving