FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSCloudPostgresTerraform
About the role
Key responsibilities & impact- Partner with service teams to define meaningful SLIs and SLOs grounded in customer experience, and build the error budget policies that turn them into engineering decisions
- Own and evolve the Operational Readiness Review (ORR) process — conducting reviews for new services and major changes across observability, alerting, runbooks, capacity, and graceful degradation
- Strengthen the incident-to-improvement pipeline: connecting postmortem findings to operational readiness gaps, identifying repeat failure patterns, and driving systemic fixes
- Act as the reliability expert teams pull in for architecture reviews, failure mode analysis, dependency mapping, and resilience design
- Identify and quantify operational toil across the org, and build or advocate for automation that eliminates it
- Help teams design sustainable on-call practices: alert quality, escalation paths, runbook coverage, and noise reduction
- Track and report on org-wide operational maturity, surfacing systemic gaps and driving remediation
Requirements
What you’ll need- Have 7+ years of experience in SRE, production engineering, or reliability-focused roles, including experience shaping SRE practices and driving adoption across engineering teams
- Have a software engineering mindset — you write code and build tools, not just configure them
- Have hands-on experience defining and operationalizing SLOs/SLIs at scale, including error budget policies that actually influenced engineering decisions
- Have deep experience with incident response, postmortem facilitation, and turning incident learnings into systemic improvements
- Have worked with large-scale multi-tenant systems (bonus: managed database platforms or Postgres)
- Are proficient with cloud infrastructure (AWS preferred) and infrastructure-as-code (Pulumi preferred, Terraform/CDK also acceptable)
- Communicate clearly and persuasively — this role requires influencing without authority across a distributed org
- Have experience in async or globally distributed teams
- Are energized by making other teams more effective rather than being the one who fixes everything
Benefits
Comp & perks- Fully Remote
- ESOP
- Tech Allowance
- Health Benefits
- Annual Off-Sites
- Flexible Work
- Professional Development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREproduction engineeringreliability engineeringSLOsSLIsincident responsepostmortem facilitationinfrastructure-as-codeautomationerror budget policies
Soft Skills
communicationinfluencing without authoritycollaborationproblem-solvingorganizational skillsleadershipadaptabilitypersuasivenessteam effectivenesscritical thinking
