Senior Site Reliability Engineer

Visa

full-time

Posted on: 3/6/2026

Location Type: Remote

Location: Brazil

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

AWS Azure Bootstrap Cloud Distributed Systems Kubernetes Terraform

About the role

Own the end‑to‑end lifecycle (design, provisioning, upgrades, maintenance, and decommissioning) of core platform components, including: Cloud infrastructure primitives Kubernetes clusters and cluster services Networking, ingress, and service discovery Service Mesh and supporting data‑plane components
Design platform components to be resilient by default, applying SRE principles such as: Fault isolation and graceful degradation Capacity planning and saturation control Reduced operational toil and clear failure modes
Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning Deterministic, repeatable platform bring‑up and teardown Dependency‑aware orchestration across cloud, network, and Kubernetes layers
Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure: Platform components are reproducible and auditable Changes are automated, testable, and reversible Manual intervention is minimized or eliminated
Identify automation gaps and lead initiatives that reduce human effort, onboarding time, and operational risk. Apply and promote SRE operational excellence practices, including: Clear ownership and runbooks for platform components Participation in on‑call rotation as a platform reliability escalation point Incident response, post‑incident reviews, and problem management
Improve day‑2 operations by standardizing upgrade/rollback strategies and reducing MTTD/MTTR. Ensure platform operations align with security, compliance, and internal control requirements.
Collaborate with engineering teams across the organization to influence platform adoption, reliability standards, and cloud‑native best practices.

Requirements

Proficiency in English at B2 level or above (Upper-Intermediate)
Strong hands‑on experience with public cloud platforms (AWS preferred, Azure also considered)
Proven experience operating and administering Kubernetes at scale in production environments
Strong experience with container orchestration platforms and cloud architecture fundamentals (networking, IAM/security concepts, and reliability patterns)
Experience with Infrastructure as Code (Terraform preferred) and automation‑first workflows
Familiarity with GitOps practices and CI/CD pipelines
Strong troubleshooting skills for distributed systems, including root‑cause analysis and reliability improvements
Experience with observability concepts and practices (monitoring, logging, alerting, tracing)
Experience with Service Mesh technologies (Istio preferred, App Mesh or Linkerd)
Experience working with critical or mission‑critical systems
Strong background applying SRE principles (operational readiness, incident management, runbooks, toil reduction)
AWS certifications.

Benefits

Remote work options

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesInfrastructure as CodeTerraformGitOpsCI/CDService MeshIstioAWSAzureobservability

Soft Skills

troubleshootingroot-cause analysiscollaborationleadershipincident managementoperational readinessproblem managementcapacity planningcommunicationreliability improvements

Certifications

AWS certifications