
Senior Site Reliability Engineer – AWS
Banco ABC Brasil
full-time
Posted on:
Location Type: Hybrid
Location: São Paulo • Brazil
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Reliability: Ensure high availability, reliability, and performance of our services, acting as the guardian of the environment's SLAs and SLOs.
- Operations & Support: Actively participate in incident resolution and service request handling, providing full support and ensuring continuous operation in both production and staging/testing environments.
- Capacity & Cloud: Manage environment capacity (CPU, memory, and disk) and support internal teams in the optimized use of cloud services, containers, and pipelines.
- Orchestration: Operate workloads on Kubernetes (EKS) with the ability to perform advanced troubleshooting.
- Observability: Develop and refine our monitoring and observability stack (CloudWatch, Prometheus, Grafana, Datadog or equivalents).
- Security & FinOps: Implement cloud security and compliance best practices (DevSecOps/auditing) and lead cost and performance optimization initiatives (FinOps, tagging, auto-scaling, budget alerts).
- Automation & CI/CD: Design, implement and maintain complex CI/CD pipelines (GitHub Actions, Jenkins or Azure DevOps).
- IaC: Actively contribute to the evolution of the infrastructure using robust Infrastructure as Code practices.
- Continuous Improvement: Analyze root causes of critical incidents and propose structural improvements (post-mortems), as well as create and update operational documentation and runbooks.
Requirements
- Systems and Networking (The Foundation): Solid knowledge in OS administration (advanced Linux, CLI, Windows environments) and network architecture (IP, DNS, ports, routing, VPCs, subnets).
- Cloud Fundamentals and Services: Strong understanding of computing models (IaaS, PaaS, SaaS) and extensive hands-on experience with core services (compute, storage, databases, serverless) on AWS, Azure, or GCP, including hybrid environments.
- Version Control and Scripting: Advanced use of Git (branching strategies, pull requests) and fluency in scripting for task automation (Python, Bash or PowerShell).
- Infrastructure and Configuration (IaC): Advanced experience in Infrastructure as Code, with expertise in Terraform and/or CloudFormation.
- Security and Access (IAM): Deep expertise in cloud identity and network management (roles, policies, security groups, advanced IAM, audit readiness).
- Containers and Orchestration: Expertise in building and managing containers (Docker: build, run, push) and extensive experience administering and troubleshooting Kubernetes clusters (EKS, AKS or upstream).
- CI/CD Pipelines: Ability to design and manage end-to-end automation pipelines (Jenkins, GitHub Actions, Azure DevOps).
- Advanced Observability: Proficiency with tools such as Prometheus, Grafana, CloudWatch and Datadog for proactive and reactive analysis of complex environments.
- FinOps: Experience reviewing and optimizing workloads with a cost-effectiveness focus (auto-scaling, right-sizing).
Benefits
- Medical insurance;
- Dental insurance (Omint);
- Life insurance;
- Profit-sharing (PLR);
- PPR (Performance-Based Bonus);
- ABC with You: a program providing support for employees and their families, including legal, social, psychological and financial assistance;
- Meal allowance;
- Food allowance;
- Extended parental leave: 20 days paternity and 6 months maternity leave;
- Childcare support / nanny subsidy;
- Annual day off;
- Home office allowance;
- Home office infrastructure/equipment allowance;
- TotalPass (wellness benefits platform);
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesAWSAzureGCPTerraformCloudFormationPythonBashGitCI/CD
Soft Skills
incident resolutioncontinuous improvementroot cause analysisdocumentationtroubleshootingoptimization