Ensure the stability, reliability and performance of applications and cloud infrastructure;
Work proactively by designing self-healing solutions (self-healing);
Perform debugging and troubleshooting on complex systems and distributed applications;
Monitor applications and infrastructure, configuring alerts, dashboards and logs for observability;
Document procedures and create clear runbooks and playbooks for incidents and operations;
Collaborate with cross-functional teams, maintaining effective communication and alignment with business objectives;
Support the implementation and maintenance of CI/CD pipelines, evaluating operational impact;
Apply SRE principles, including defining SLOs, SLIs, error budgets and reducing toil;
Actively participate in incident resolution and the continuous improvement of infrastructure and processes;
Assist in defining and implementing cloud security, compliance and governance best practices.
Requirements
3–5+ years in technical operations, SRE or advanced IT support;
Hands-on experience with Google Cloud Platform (GCP);
Scripting and automation skills (Python, Bash);
Experience with monitoring and logging tools (Prometheus, Grafana, Cloud Monitoring, ELK Stack, Cloud Logging);
Experience with microservices architectures and supporting distributed applications;
Knowledge of CI/CD pipelines and their operational impact;
Strong understanding of cloud computing and Infrastructure as Code;
Experience in technical documentation and creating runbooks;
Fluency in Portuguese and professional proficiency in English.
Desired: Google Cloud certifications (Associate Cloud Engineer, Professional Cloud DevOps Engineer); SRE or DevOps certifications; Knowledge of ITIL principles, especially incident and problem management; Experience with operations automation solutions, advanced observability and DevOps/SRE practices.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
SRE principlesPythonBashCI/CD pipelinesInfrastructure as Codemonitoring toolslogging toolsmicroservices architecturescloud securitytroubleshooting
Soft skills
effective communicationcollaborationincident resolutiondocumentationproactive problem solvingcontinuous improvementalignment with business objectives