FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesDistributed Systems
About the role
Key responsibilities & impact- Drive Operational Excellence: Design, implement, and maintain highly available, scalable, and resilient systems that deliver exceptional customer experience
- Datadog Expert: Be one of the go-to experts for Datadog, responsible for defining and implementing best practices
- Software Development for Reliability: Develop robust, well-tested, and maintainable software to automate operational tasks
- Toil Reduction Champion: Identify and eliminate toil through automation and process improvements
- Incident Management & Post-Mortems: Lead blameless post-mortems and contribute to incident response framework
- Reliability Metrics & Goals: Collaborate to define, implement, and track Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets
- Infrastructure as Code: Leverage and contribute to infrastructure as code efforts
- System Design & Architecture: Provide SRE expertise in system design reviews
- Knowledge Sharing & Mentorship: Document processes and share expertise with team
Requirements
What you’ll need- Demonstrated experience operating and improving production systems at scale in an SRE, Production Engineering, or Platform Engineering role
- Proven ability to rapidly build accurate mental models of complex distributed systems across infrastructure, applications, networking, identity, and observability domains
- Strong troubleshooting skills with a methodical, evidence-driven approach to incident response and root cause analysis
- Experience defining, implementing, and using Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to guide reliability decisions
- Excellent written and verbal communication skills, with the ability to explain complex technical issues clearly to both technical and non-technical audiences
Benefits
Comp & perks- Flexible work arrangements
- Professional development opportunities
- Continuous improvement culture
- Mentorship opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
software developmentautomationincident managementroot cause analysisinfrastructure as codesystem designreliability metricstroubleshootingprocess improvementsscalable systems
Soft Skills
communicationmentorshipcollaborationproblem-solvingevidence-driven approach
