Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Darede

Senior Site Reliability Engineer

Darede

Senior SRE responsible for transitioning operations to a reliability culture in a cloud environment. Design and implement solutions to prevent system failures for business-critical applications.

Posted 4/27/2026full-timeRemote • 🇧🇷 BrazilSeniorWebsite

Tech Stack

Tools & technologies
AWSDockerEC2GoKubernetes.NETOraclePostgresPythonTerraform

About the role

Key responsibilities & impact
  • **Incident Leadership:**
  • Act as Incident Response Lead in War Rooms, coordinating technical remediation and communication with stakeholders.
  • **Observability Engineering:**
  • Design and evolve telemetry in Datadog (Logs, APM, Traces and business metrics) to reduce MTTD and the team's cognitive load.
  • **Workload Management on AWS Amplify:**
  • Ensure the resilience and scalability of hosted front-end applications and critical APIs.
  • **SRE Governance:**
  • Define and monitor SLIs, SLOs and SLAs, managing the Error Budget to balance delivery speed with stability.
  • **Mitigation Automation:**
  • Develop auto-healing tools and scripts (automatic rollback, controlled restart, component isolation).
  • **Root Cause Analysis:**
  • Lead blameless post-mortem processes and ensure the implementation of structural improvements to prevent recurrence.
  • **Systems Modernization:**
  • Work with development teams to implement resilience patterns (Circuit Breakers, Bulkheads and Rate Limiting) in both modern architectures and legacy systems.
  • **AI in Operations:**
  • Implement anomaly detection and intelligent response solutions using AIOps (Datadog Bits AI or AWS DevOps Agent).

Requirements

What you’ll need
  • **Proven Seniority in SRE or DevOps:** Solid experience in high-scale, mission-critical environments.
  • **Deep AWS Expertise:** Advanced experience with EC2, RDS, S3, IAM, EKS and Amplify.
  • **Observability Tools:** Strong experience in monitoring, logging and APM (preferably using Datadog).
  • **Containers & Orchestration:** Strong knowledge of Docker and Kubernetes (EKS/GKE).
  • **Infrastructure as Code (IaC):** Proficiency in Terraform.
  • **Development/Scripting:** Proficient in Python, Go or Shell scripting for automation.
  • **Incident Management:** Real experience with on-call rotations and real-time problem resolution.
  • **Plus / Nice-to-haves:**
  • **Analytical Profile for Legacy Systems:** Experience troubleshooting .NET Framework applications and Oracle or PostgreSQL databases.
  • **Chaos Engineering:** Experience executing controlled stress and resilience tests.
  • **Certifications:** AWS Certified DevOps Engineer - Professional or official Datadog certifications.

Benefits

Comp & perks
  • 📚 Educational Incentives (Partnerships with Educational Institutions)
  • 🌴 Paid Vacation
  • 🏋️ TotalPass
  • 🎂 Birthday off
  • 🏥 Health Insurance
  • 🦷 Dental Insurance
  • 🤰 Maternity Leave
  • 👨‍👩‍👧‍👦 Paternity Leave
  • 🌟 Reimbursement for AWS Certifications

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Incident ResponseTelemetry DesignAWS AmplifySLIsSLOsSLAsAuto-healing ToolsPythonGoTerraform
Soft Skills
LeadershipCommunicationAnalytical ThinkingProblem Resolution
Certifications
AWS Certified DevOps Engineer - ProfessionalDatadog Certifications