Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
QAD

Senior Site Reliability Engineer – SRE

QAD

Senior Site Reliability Engineer at Redzone, ensuring reliability and performance of mission-critical services. Evolving SRE practices while driving automation and operational excellence within the team.

Posted 6/19/2026full-timeRemote • 🇪🇸 SpainSeniorWebsite

Tech Stack

Tools & technologies
Distributed Systems

About the role

Key responsibilities & impact
  • Drive Operational Excellence: Design, implement, and maintain highly available, scalable, and resilient systems that deliver exceptional customer experience
  • Datadog Expert: Be one of the go-to experts for Datadog, responsible for defining and implementing best practices
  • Software Development for Reliability: Develop robust, well-tested, and maintainable software to automate operational tasks
  • Toil Reduction Champion: Identify and eliminate toil through automation and process improvements
  • Incident Management & Post-Mortems: Lead blameless post-mortems and contribute to incident response framework
  • Reliability Metrics & Goals: Collaborate to define, implement, and track Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets
  • Infrastructure as Code: Leverage and contribute to infrastructure as code efforts
  • System Design & Architecture: Provide SRE expertise in system design reviews
  • Knowledge Sharing & Mentorship: Document processes and share expertise with team

Requirements

What you’ll need
  • Demonstrated experience operating and improving production systems at scale in an SRE, Production Engineering, or Platform Engineering role
  • Proven ability to rapidly build accurate mental models of complex distributed systems across infrastructure, applications, networking, identity, and observability domains
  • Strong troubleshooting skills with a methodical, evidence-driven approach to incident response and root cause analysis
  • Experience defining, implementing, and using Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to guide reliability decisions
  • Excellent written and verbal communication skills, with the ability to explain complex technical issues clearly to both technical and non-technical audiences

Benefits

Comp & perks
  • Flexible work arrangements
  • Professional development opportunities
  • Continuous improvement culture
  • Mentorship opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
software developmentautomationincident managementroot cause analysisinfrastructure as codesystem designreliability metricstroubleshootingprocess improvementsscalable systems
Soft Skills
communicationmentorshipcollaborationproblem-solvingevidence-driven approach