FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Specialist II – Site Reliability Engineering, Command Center
Itaú UnibancoSite Reliability Engineer at Itaú responsible for ensuring service reliability and stability. Collaborating with technology and business teams and leading observability and innovation initiatives.
Tech Stack
Tools & technologiesAWSCloudPython
About the role
Key responsibilities & impact- Lead troubleshooting across the software development lifecycle, collaborating with development, architecture, platform, and operations teams to drive continuous improvements;
- Manage critical incident response, contributing to reduced impact, lower MTTR and decreased recurrence;
- Act as the reliability expert for a specific Business Unit, developing deep knowledge of its journeys and critical services;
- Map risks, weaknesses and opportunities for improvement through the lens of Site Reliability Engineering (SRE) principles;
- Identify and lead initiatives to increase availability, stability, performance, and resilience of systems;
- Define, track, and evolve reliability metrics such as SLIs, SLOs, and Error Budgets;
- Lead root cause analyses and structured remediation plans for recurring issues;
- Promote adoption of automation to reduce manual operational work (toil) and improve operational efficiency;
- Explore and implement AI- and AIOps-based solutions for incident prevention, detection, and resolution;
- Develop a proactive operational perspective, anticipating risks before they impact customers and the business.
Requirements
What you’ll need- Experience with Site Reliability Engineering (SRE) practices and experience operating in Command Centers, NOCs, or Operations Centers;
- Proven, in-depth experience across stages of the software development lifecycle;
- Strong knowledge of infrastructure and AWS Cloud;
- Fluent English for leading war rooms and technical meetings;
- Knowledge of observability, monitoring, and incident management;
- Experience with automation using languages such as Python, Shell Script, or similar;
- Knowledge of distributed architecture, microservices, and critical systems;
- Experience with observability and monitoring tools;
- Familiarity with container platforms and orchestration;
- Knowledge of Artificial Intelligence applied to operations, observability, and automation.
Benefits
Comp & perks- Transportation allowance
- Meal voucher (restaurants) / Food voucher (supermarkets)
- Medical plan (Fundação Saúde Itaú or Central Nacional Unimed)
- Dental plan (Odontoprev or Interodonto)
- Life insurance
- Profit-sharing (PLR) – subject to the bank's results
- Private pension
- Exclusive discounts on our financial products
- Extended maternity and paternity leave
- Childcare / nanny allowance (for parents)
- Education assistance
- Wellhub or TotalPass
- Access to Itaú Leisure Clubs (Itanhaém and São Sebastião)
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability Engineering (SRE)AWS CloudPythonShell Scriptobservabilitymonitoringincident managementdistributed architecturemicroservicesautomation
Soft Skills
collaborationleadershipcommunicationproblem-solvingproactive thinking