SRE Pleno – Tarde/Noite

Banco ABC Brasil

SRE Pleno focusing on hybrid operations and incident management at Banco ABC Brasil. Collaborating to ensure operational health across various cloud and on-premises environments.

Posted 5/19/2026full-timeSão Paulo • 🇧🇷 BrazilMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AWSAzureCloudDNSFirewallsGoogle Cloud PlatformGrafanaLinuxTCP/IP

About the role

Key responsibilities & impact

Act as the first-response point (N1/N2) for incident handling in cloud (AWS, Azure, GCP) and on-premises environments, performing triage, severity classification and formal logging following ITIL.
Perform initial incident diagnosis, investigating root causes using logs, metrics and observability events (Zabbix, Grafana, CloudWatch and Dynatrace).
Escalate correctly to N2/N3 when the incident exceeds the level's scope, ensuring accurate handover of information and context.
Document all incidents accurately: symptoms, actions taken, resolution, recovery time and lessons learned, contributing to the team's knowledge base.
Participate in on-call rotations, ensuring coverage and response times within established SLAs.
Continuously monitor infrastructure dashboards and alerts, acting proactively before degradations become critical incidents.
Investigate capacity, performance, availability and storage alerts in cloud (AWS, Azure, GCP) and on-premises environments, taking corrective actions or escalating with full context.
Fulfill infrastructure requests (provisioning, resource adjustments, access creation, configurations) within established deadlines and standards.
Execute routine operational tasks: patches, backups, capacity checks, cleanup of obsolete resources and inventory updates.
Plan, document and execute Change Management (GMUD) activities in production environments, following the ITIL Change Management process.

Requirements

What you’ll need

Proven experience operating AWS cloud production environments, capable of diagnosing and resolving incidents without constant supervision.
Strong knowledge of Linux and Windows Server: administration, logs, service troubleshooting and connectivity.
Experience with observability tools (Zabbix, Grafana or CloudWatch) for alert investigation and event correlation.
Experience applying ITIL: incident opening, classification and resolution; executing GMUDs with rollback plans.
Active Directory: user and group creation, GPOs, authentication troubleshooting.
Basic networking: TCP/IP, DNS, DHCP, VPN, firewalls, VLANs — sufficient to diagnose connectivity issues.
Operational-level Bash or PowerShell for automating routine tasks.
Degree in Computer Science, Network Engineering, Information Systems, Systems Analysis and Development or related fields.
Ongoing degree study will be considered if the candidate fully meets practical experience requirements and holds at least one technical certification.

Benefits

Comp & perks

Medical Insurance
Dental Insurance (Omint)
Life Insurance
Profit Sharing (PLR)
Performance Bonus (PPR)
"ABC with You": a program supporting employees and their families with legal, social, psychological and financial assistance
Meal Voucher
Food Voucher
Extended Parental Leave: 20 days paternity and 6 months maternity
Childcare/Babysitter Allowance
Annual Day Off
Home Office Infrastructure Allowance
TotalPass

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

AWSAzureGCPLinuxWindows ServerBashPowerShellITILActive Directorynetworking

Soft Skills

incident handlingproblem solvingcommunicationdocumentationteam collaborationproactive monitoringtime managementattention to detailescalation managementchange management