FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSPrometheusPythonTerraform
About the role
Key responsibilities & impact- Implement the Observability Ladder and define SLAs, SLIs, and SLOs.
- Build deployment tooling that allows teams to automate rollbacks when error budgets are depleted.
- Drive a blameless post-mortem culture focused on actionable takeaways and measurable metrics.
- Continuously improve alerting and on-call frameworks to reduce alert fatigue.
- Develop systems for pre- and post-deployment verification,
- Lead the drive to manage reliability suite through IaC using Terraform.
Requirements
What you’ll need- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience in Software Engineering, SRE, DevOps, or Platform Engineering.
- Strong coding fluency: Proficiency in Python (or similar).
- Hands-on experience with AWS, and a solid understanding of Infrastructure as Code (Terraform or CloudFormation).
- Demonstrable experience with monitoring tools (DataDog, Prometheus, ELK stack).
- Strong understanding of SRE concepts including Golden Signals and error budget mathematics.
- Proven ability to define and drive reliability standards across teams.
Benefits
Comp & perks- Flexibility and the freedom to work remotely.
- Work-life balance where you are not expected to work over weekends or after hours.
- A forward thinking remote company that provides virtual social platforms for employee engagement.
- A monthly work from home allowance.
- A MacBook or Windows laptop for you to do your best work on.
- Support for your career growth and celebration of your successes and advancement.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonTerraformCloudFormationmonitoring toolsDataDogPrometheusELK stackInfrastructure as CodeSRE conceptserror budget mathematics
Soft Skills
blameless post-mortem cultureactionable takeawaysmeasurable metricscontinuous improvementteam collaborationreliability standards
