Banco ABC Brasil

Senior Site Reliability Engineer – AWS

Banco ABC Brasil

full-time

Posted on:

Location Type: Hybrid

Location: São PauloBrazil

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Reliability: Ensure high availability, reliability, and performance of our services, acting as the guardian of the environment's SLAs and SLOs.
  • Operations & Support: Actively participate in incident resolution and service request handling, providing full support and ensuring continuous operation in both production and staging/testing environments.
  • Capacity & Cloud: Manage environment capacity (CPU, memory, and disk) and support internal teams in the optimized use of cloud services, containers, and pipelines.
  • Orchestration: Operate workloads on Kubernetes (EKS) with the ability to perform advanced troubleshooting.
  • Observability: Develop and refine our monitoring and observability stack (CloudWatch, Prometheus, Grafana, Datadog or equivalents).
  • Security & FinOps: Implement cloud security and compliance best practices (DevSecOps/auditing) and lead cost and performance optimization initiatives (FinOps, tagging, auto-scaling, budget alerts).
  • Automation & CI/CD: Design, implement and maintain complex CI/CD pipelines (GitHub Actions, Jenkins or Azure DevOps).
  • IaC: Actively contribute to the evolution of the infrastructure using robust Infrastructure as Code practices.
  • Continuous Improvement: Analyze root causes of critical incidents and propose structural improvements (post-mortems), as well as create and update operational documentation and runbooks.

Requirements

  • Systems and Networking (The Foundation): Solid knowledge in OS administration (advanced Linux, CLI, Windows environments) and network architecture (IP, DNS, ports, routing, VPCs, subnets).
  • Cloud Fundamentals and Services: Strong understanding of computing models (IaaS, PaaS, SaaS) and extensive hands-on experience with core services (compute, storage, databases, serverless) on AWS, Azure, or GCP, including hybrid environments.
  • Version Control and Scripting: Advanced use of Git (branching strategies, pull requests) and fluency in scripting for task automation (Python, Bash or PowerShell).
  • Infrastructure and Configuration (IaC): Advanced experience in Infrastructure as Code, with expertise in Terraform and/or CloudFormation.
  • Security and Access (IAM): Deep expertise in cloud identity and network management (roles, policies, security groups, advanced IAM, audit readiness).
  • Containers and Orchestration: Expertise in building and managing containers (Docker: build, run, push) and extensive experience administering and troubleshooting Kubernetes clusters (EKS, AKS or upstream).
  • CI/CD Pipelines: Ability to design and manage end-to-end automation pipelines (Jenkins, GitHub Actions, Azure DevOps).
  • Advanced Observability: Proficiency with tools such as Prometheus, Grafana, CloudWatch and Datadog for proactive and reactive analysis of complex environments.
  • FinOps: Experience reviewing and optimizing workloads with a cost-effectiveness focus (auto-scaling, right-sizing).
Benefits
  • Medical insurance;
  • Dental insurance (Omint);
  • Life insurance;
  • Profit-sharing (PLR);
  • PPR (Performance-Based Bonus);
  • ABC with You: a program providing support for employees and their families, including legal, social, psychological and financial assistance;
  • Meal allowance;
  • Food allowance;
  • Extended parental leave: 20 days paternity and 6 months maternity leave;
  • Childcare support / nanny subsidy;
  • Annual day off;
  • Home office allowance;
  • Home office infrastructure/equipment allowance;
  • TotalPass (wellness benefits platform);
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesAWSAzureGCPTerraformCloudFormationPythonBashGitCI/CD
Soft Skills
incident resolutioncontinuous improvementroot cause analysisdocumentationtroubleshootingoptimization