Grupo PRIMO

SRE, Pleno

Grupo PRIMO

full-time

Posted on:

Location Type: Hybrid

Location: BarueriBrazil

Visit company website

Explore more

AI Apply
Apply

About the role

  • Define and implement SLI/SLOs for critical services (latency, availability, error rate)
  • Establish company-wide observability standards (structured logs, distributed traces, metrics – RED/USE)
  • Configure dashboards and alerts in Datadog (SLO tracking, burn rate, anomaly detection)
  • Create and maintain runbooks for troubleshooting and incident response
  • Participate in blameless postmortems and ensure implementation of improvements
  • Enable engineering teams to adopt reliability standards (office hours, pairing, documentation)
  • Map and monitor costs by product, team, and environment
  • Identify and eliminate waste (idle resources, old snapshots, unused volumes)
  • Implement optimization automations (automatic shutdown, rightsizing, orphaned resource cleanup)
  • Configure cost anomaly alerts and budget tracking
  • Collaborate with teams to validate and execute optimizations
  • Conduct weekly office hours
  • Document standards, runbooks, and processes clearly and consumably
  • Pair with developers to implement standards
  • Collect feedback and propose continuous improvements
  • Present results in monthly reviews and all-hands

Requirements

  • Observability: structured logs, distributed traces, metrics (golden signals)
  • Platforms: Datadog, New Relic, Grafana/Prometheus, ELK or similar
  • Cloud: Strong experience in AWS, GCP or Azure
  • Automation: Python, Bash or Go
  • IaC: Terraform, CloudFormation, Pulumi or similar
  • CI/CD: Knowledge of pipelines (GitHub Actions, GitLab CI, Jenkins)
  • Containers: Docker and Kubernetes (deployments, services, ingress)
  • Advanced Datadog (APM, SLO Tracking, Cloud Cost Management) - Plus
  • Practical experience with SLO/error budgets in production - Plus
  • FinOps (tagging, budgets, anomaly detection, cost optimization) - Plus
  • DORA metrics and DevEx practices - Plus
  • Incident management, on-call and structured postmortems - Plus
  • End-to-end ownership and accountability - Behavioral
  • Consistent presence and proactive communication - Behavioral
  • Pragmatism and focus on incremental deliveries - Behavioral
  • Clear communication for technical and executive audiences - Behavioral
  • Enablement mindset - Behavioral
  • Continuous learning and autonomy - Behavioral
Benefits
  • Semiannual Variable Bonus
  • Meal Allowance and Food Voucher available on Ifood flexible card
  • SulAmérica Health Plan
  • SulAmérica Dental Plan
  • Total Pass
  • Life Insurance
  • Commuter Allowance
  • Childcare Assistance
  • Access to Grupo Primo platforms
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SLISLOstructured logsdistributed tracesmetricsPythonBashGoTerraformCloudFormation
Soft Skills
end-to-end ownershipaccountabilityproactive communicationclear communicationenablement mindsetcontinuous learningpragmatismfocus on incremental deliveries