
SRE, Pleno
Grupo PRIMO
full-time
Posted on:
Location Type: Hybrid
Location: Barueri • Brazil
Visit company websiteExplore more
Tech Stack
About the role
- Define and implement SLI/SLOs for critical services (latency, availability, error rate)
- Establish company-wide observability standards (structured logs, distributed traces, metrics – RED/USE)
- Configure dashboards and alerts in Datadog (SLO tracking, burn rate, anomaly detection)
- Create and maintain runbooks for troubleshooting and incident response
- Participate in blameless postmortems and ensure implementation of improvements
- Enable engineering teams to adopt reliability standards (office hours, pairing, documentation)
- Map and monitor costs by product, team, and environment
- Identify and eliminate waste (idle resources, old snapshots, unused volumes)
- Implement optimization automations (automatic shutdown, rightsizing, orphaned resource cleanup)
- Configure cost anomaly alerts and budget tracking
- Collaborate with teams to validate and execute optimizations
- Conduct weekly office hours
- Document standards, runbooks, and processes clearly and consumably
- Pair with developers to implement standards
- Collect feedback and propose continuous improvements
- Present results in monthly reviews and all-hands
Requirements
- Observability: structured logs, distributed traces, metrics (golden signals)
- Platforms: Datadog, New Relic, Grafana/Prometheus, ELK or similar
- Cloud: Strong experience in AWS, GCP or Azure
- Automation: Python, Bash or Go
- IaC: Terraform, CloudFormation, Pulumi or similar
- CI/CD: Knowledge of pipelines (GitHub Actions, GitLab CI, Jenkins)
- Containers: Docker and Kubernetes (deployments, services, ingress)
- Advanced Datadog (APM, SLO Tracking, Cloud Cost Management) - Plus
- Practical experience with SLO/error budgets in production - Plus
- FinOps (tagging, budgets, anomaly detection, cost optimization) - Plus
- DORA metrics and DevEx practices - Plus
- Incident management, on-call and structured postmortems - Plus
- End-to-end ownership and accountability - Behavioral
- Consistent presence and proactive communication - Behavioral
- Pragmatism and focus on incremental deliveries - Behavioral
- Clear communication for technical and executive audiences - Behavioral
- Enablement mindset - Behavioral
- Continuous learning and autonomy - Behavioral
Benefits
- Semiannual Variable Bonus
- Meal Allowance and Food Voucher available on Ifood flexible card
- SulAmérica Health Plan
- SulAmérica Dental Plan
- Total Pass
- Life Insurance
- Commuter Allowance
- Childcare Assistance
- Access to Grupo Primo platforms
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SLISLOstructured logsdistributed tracesmetricsPythonBashGoTerraformCloudFormation
Soft Skills
end-to-end ownershipaccountabilityproactive communicationclear communicationenablement mindsetcontinuous learningpragmatismfocus on incremental deliveries