Visa

Manager, Site Reliability Engineer – Platform

Visa

full-time

Posted on:

Location Type: Remote

Location: Brazil

Visit company website

Explore more

AI Apply
Apply

About the role

  • Act as the technical owner of the Platform Squad, defining, driving, and enforcing platform standards across the full lifecycle (design, rollout, upgrades, and decommissioning) for: Cloud infrastructure, Kubernetes, Service Mesh
  • Ensure platform components are designed and operated according to SRE principles, focusing on reliability, scalability, and operational simplicity
  • Drive architectural decisions with a sustainable platform vision, balancing innovation, security, and operational stability
  • Define, build, and continuously improve operational processes for internal and external consumers, including: Platform onboarding and adoption, Change management and release processes, Incident, problem, and escalation management
  • Act as a point of escalation for complex platform incidents and reliability risks, participating in on-call rotations as needed
  • Ensure platform operations comply with internal controls, audit requirements, and security standards
  • Establish and own platform observability standards, ensuring consistent implementation of Golden Signals: Latency, Traffic, Errors, Saturation
  • Define and track platform SLIs, SLOs, and error budgets in partnership with internal consumers
  • Use metrics and operational data to drive prioritization, reliability improvements, and capacity planning decisions
  • Foster a collaborative, servant-leadership culture that enables squads to self-serve while maintaining guardrails
  • Collaborate closely with application engineering teams, other SRE squads, and stakeholders across security, compliance, and architecture
  • Promote knowledge sharing through strong documentation and enablement around platform usage and best practices
  • Provide technical mentorship and guidance to platform engineers, supporting engineering excellence and growth
  • Support the Squad Manager in planning, prioritization, and execution of platform initiatives
  • Ensure work is visible, well-documented, and aligned with broader SRE and company objectives

Requirements

  • 5+ years of relevant work experience with a Bachelor’s Degree
  • Proven experience in Platform Engineering and/or SRE roles, with demonstrated technical leadership
  • Strong hands-on experience with public cloud platforms (AWS preferred; Azure is a plus)
  • Strong experience operating Kubernetes at scale (EKS or equivalent)
  • Experience with Service Mesh technologies (Istio preferred; App Mesh, Linkerd, etc. are a plus)
  • Solid understanding of SRE fundamentals, including SLIs/SLOs, error budgets, and reliability-driven prioritization
  • Strong experience with observability tooling and practices, including metrics, logging, tracing, alerting, and Golden Signals
  • Strong incident management and on-call operations experience, including escalation and problem management
  • Experience with Infrastructure as Code (e.g., Terraform) and cloud-native operational patterns
  • Strong understanding of cloud-native microservices architecture and platform enablement patterns
  • Ability to translate complex technical concepts into clear guidance for non-platform teams
  • Excellent collaboration, communication, and stakeholder management skills.
Benefits
  • Health insurance
  • Flexible work arrangements
  • Professional development
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Platform EngineeringSite Reliability Engineering (SRE)KubernetesService MeshAWSAzureInfrastructure as CodeTerraformObservability toolingCloud-native microservices architecture
Soft Skills
Technical leadershipCollaborationCommunicationStakeholder managementMentorshipServant-leadershipDocumentationProblem managementPrioritizationExecution
Certifications
Bachelor’s Degree