
Explore more
Job Level
About the role
- Excellence in Reliability: Implement operational initiatives to ensure resilience and scalability are applied to every delivery and solution developed.
- Platform Sustenance (Self-Service): Maintain and evolve abstractions and automations that simplify deployment and infrastructure management, ensuring the technology team has agility with security.
- Focus on Automation: Proactively identify manual and repetitive operational tasks, developing scripts and tools to automate them and optimize the Ops team’s time.
- Continuous Process Improvement: Analyze existing operational workflows and propose technical improvements that increase predictability, security, and stability of the production environment.
- Incident Autonomy: Take a leading role in resolving critical incidents, performing complex troubleshooting, and actively contributing to root-cause documentation (post-mortems).
- Change Execution and Architecture: Execute changes in production environments and review technical proposals, ensuring adherence to the company’s security, cost (FinOps), and governance standards.
- Promotion of SRE Culture: Support the technical growth of less-experienced team members and collaborate on knowledge sharing about modern infrastructure and distributed systems.
Requirements
- Container Expertise: Manage and optimize Amazon ECS and EKS clusters, ensuring application health and orchestrator efficiency, with solid experience in Amazon ECS, including service configuration, task definitions, auto-scaling, and integration with load balancers.
- Database Administration: Hands-on experience with MySQL and PostgreSQL (RDS/Aurora), focusing on performance, query troubleshooting, and high availability.
- Observability with New Relic: Proficient with the platform, including APM and dashboards. Experience with NRQL for creating intelligent alerts and managing SLOs/SLIs.
- Infrastructure Security: Direct involvement in CVE remediation, container image hardening, and applying security patches.
- Infrastructure as Code (IaC): Strong knowledge of Terraform and Ansible for automating AWS environments.
- CI/CD Experience: Operating automation pipelines, preferably using GitLab CI/CD.
- AWS Expertise: Advanced knowledge of core AWS services and FinOps strategies (cost optimization).
Benefits
- Health insurance;
- Dental insurance;
- Meal allowance or food voucher;
- Childcare assistance;
- Transportation voucher;
- Profit-sharing program (PPR);
- Birthday day off;
- Life insurance;
- Wellhub;
- Férias&Co (travel benefit);
- 6-month maternity leave and 20 days paternity leave;
- Flexible working hours;
- #Secuida - our Quality of Life Program;
- Partnerships with various establishments and institutions in education, health, leisure, entertainment, and more.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Amazon ECSAmazon EKSMySQLPostgreSQLNew RelicTerraformAnsibleCI/CDGitLab CI/CDInfrastructure as Code
Soft skills
problem-solvingcollaborationleadershipcommunicationprocess improvementincident managementtechnical mentoringagilitytroubleshootingknowledge sharing