Senior Site Reliability Engineer – AWS

Banco ABC Brasil

full-time

Posted on: 3/30/2026

Location Type: Hybrid

Location: São Paulo • Brazil

✨ AI Apply

About the role

Reliability: Ensure high availability, reliability, and performance of our services, acting as the guardian of the environment's SLAs and SLOs.
Operations & Support: Actively participate in incident resolution and service request handling, providing full support and ensuring continuous operation in both production and staging/testing environments.
Capacity & Cloud: Manage environment capacity (CPU, memory, and disk) and support internal teams in the optimized use of cloud services, containers, and pipelines.
Orchestration: Operate workloads on Kubernetes (EKS) with the ability to perform advanced troubleshooting.
Observability: Develop and refine our monitoring and observability stack (CloudWatch, Prometheus, Grafana, Datadog or equivalents).
Security & FinOps: Implement cloud security and compliance best practices (DevSecOps/auditing) and lead cost and performance optimization initiatives (FinOps, tagging, auto-scaling, budget alerts).
Automation & CI/CD: Design, implement and maintain complex CI/CD pipelines (GitHub Actions, Jenkins or Azure DevOps).
IaC: Actively contribute to the evolution of the infrastructure using robust Infrastructure as Code practices.
Continuous Improvement: Analyze root causes of critical incidents and propose structural improvements (post-mortems), as well as create and update operational documentation and runbooks.

Systems and Networking (The Foundation): Solid knowledge in OS administration (advanced Linux, CLI, Windows environments) and network architecture (IP, DNS, ports, routing, VPCs, subnets).
Cloud Fundamentals and Services: Strong understanding of computing models (IaaS, PaaS, SaaS) and extensive hands-on experience with core services (compute, storage, databases, serverless) on AWS, Azure, or GCP, including hybrid environments.
Version Control and Scripting: Advanced use of Git (branching strategies, pull requests) and fluency in scripting for task automation (Python, Bash or PowerShell).
Infrastructure and Configuration (IaC): Advanced experience in Infrastructure as Code, with expertise in Terraform and/or CloudFormation.
Security and Access (IAM): Deep expertise in cloud identity and network management (roles, policies, security groups, advanced IAM, audit readiness).
Containers and Orchestration: Expertise in building and managing containers (Docker: build, run, push) and extensive experience administering and troubleshooting Kubernetes clusters (EKS, AKS or upstream).
CI/CD Pipelines: Ability to design and manage end-to-end automation pipelines (Jenkins, GitHub Actions, Azure DevOps).
Advanced Observability: Proficiency with tools such as Prometheus, Grafana, CloudWatch and Datadog for proactive and reactive analysis of complex environments.
FinOps: Experience reviewing and optimizing workloads with a cost-effectiveness focus (auto-scaling, right-sizing).

Benefits

Medical insurance;
Dental insurance (Omint);
Life insurance;
Profit-sharing (PLR);
PPR (Performance-Based Bonus);
ABC with You: a program providing support for employees and their families, including legal, social, psychological and financial assistance;
Meal allowance;
Food allowance;
Extended parental leave: 20 days paternity and 6 months maternity leave;
Childcare support / nanny subsidy;
Annual day off;
Home office allowance;
Home office infrastructure/equipment allowance;
TotalPass (wellness benefits platform);

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesAWSAzureGCPTerraformCloudFormationPythonBashGitCI/CD

Soft Skills

incident resolutioncontinuous improvementroot cause analysisdocumentationtroubleshootingoptimization