Manager, Site Reliability Engineer – Platform

Visa

full-time

Posted on: 2/26/2026

Location Type: Remote

Location: Brazil

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

AWS Azure Cloud Kubernetes Microservices Terraform

About the role

Act as the technical owner of the Platform Squad, defining, driving, and enforcing platform standards across the full lifecycle (design, rollout, upgrades, and decommissioning) for: Cloud infrastructure, Kubernetes, Service Mesh
Ensure platform components are designed and operated according to SRE principles, focusing on reliability, scalability, and operational simplicity
Drive architectural decisions with a sustainable platform vision, balancing innovation, security, and operational stability
Define, build, and continuously improve operational processes for internal and external consumers, including: Platform onboarding and adoption, Change management and release processes, Incident, problem, and escalation management
Act as a point of escalation for complex platform incidents and reliability risks, participating in on-call rotations as needed
Ensure platform operations comply with internal controls, audit requirements, and security standards
Establish and own platform observability standards, ensuring consistent implementation of Golden Signals: Latency, Traffic, Errors, Saturation
Define and track platform SLIs, SLOs, and error budgets in partnership with internal consumers
Use metrics and operational data to drive prioritization, reliability improvements, and capacity planning decisions
Foster a collaborative, servant-leadership culture that enables squads to self-serve while maintaining guardrails
Collaborate closely with application engineering teams, other SRE squads, and stakeholders across security, compliance, and architecture
Promote knowledge sharing through strong documentation and enablement around platform usage and best practices
Provide technical mentorship and guidance to platform engineers, supporting engineering excellence and growth
Support the Squad Manager in planning, prioritization, and execution of platform initiatives
Ensure work is visible, well-documented, and aligned with broader SRE and company objectives

Requirements

5+ years of relevant work experience with a Bachelor’s Degree
Proven experience in Platform Engineering and/or SRE roles, with demonstrated technical leadership
Strong hands-on experience with public cloud platforms (AWS preferred; Azure is a plus)
Strong experience operating Kubernetes at scale (EKS or equivalent)
Experience with Service Mesh technologies (Istio preferred; App Mesh, Linkerd, etc. are a plus)
Solid understanding of SRE fundamentals, including SLIs/SLOs, error budgets, and reliability-driven prioritization
Strong experience with observability tooling and practices, including metrics, logging, tracing, alerting, and Golden Signals
Strong incident management and on-call operations experience, including escalation and problem management
Experience with Infrastructure as Code (e.g., Terraform) and cloud-native operational patterns
Strong understanding of cloud-native microservices architecture and platform enablement patterns
Ability to translate complex technical concepts into clear guidance for non-platform teams
Excellent collaboration, communication, and stakeholder management skills.

Benefits

Health insurance
Flexible work arrangements
Professional development

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Platform EngineeringSite Reliability Engineering (SRE)KubernetesService MeshAWSAzureInfrastructure as CodeTerraformObservability toolingCloud-native microservices architecture

Soft Skills

Technical leadershipCollaborationCommunicationStakeholder managementMentorshipServant-leadershipDocumentationProblem managementPrioritizationExecution

Certifications

Bachelor’s Degree