
Cloud Reliability Engineer
Infios
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇧🇷 Brazil
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AnsibleAWSAzureCloudGoogle Cloud PlatformKubernetesPythonTerraform
About the role
- Operate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments.
- Manage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades.
- Ensure system availability, scalability, and performance through proactive monitoring and optimization.
- Maintain infrastructure-as-code (IaC) for consistent and repeatable deployments.
- Identify opportunities for operational automation to eliminate manual processes (“reduce toil”).
- Build and maintain automated pipelines for deployments, configuration, and remediation.
- Develop self-healing mechanisms to automatically detect and resolve common service issues.
- Design proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog).
- Collaborate with DevOps and development teams to build reliable, observable, and resilient systems.
- Monitor, troubleshoot, and resolve infrastructure and application issues.
Requirements
- Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).
- 5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles.
- Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP).
- Strong knowledge of Kubernetes deployment, management, and troubleshooting.
- Solid understanding of observability and monitoring (e.g., Dynatrace, DataDog) and incident management platforms.
- Proficiency in scripting and automation (e.g., Python, Bash, Terraform, Ansible).
- Strong troubleshooting and analytical skills across infrastructure and applications.
- Experience with incident response, RCA, and postmortem processes.
- A mindset of continuous improvement, reliability, and self-healing automation.
- Understanding of SRE principles, SLAs/SLOs/SLIs, and chaos engineering practices.
Benefits
- Competitive salary
- Flexible working hours
- Professional development budget
- Home office setup allowance
- Global team events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
cloud infrastructureKubernetesinfrastructure-as-codeautomationscriptingPythonBashTerraformAnsibleobservability
Soft skills
troubleshootinganalytical skillscontinuous improvementcollaborationreliabilityself-healing automation