Infios

Cloud Reliability Engineer

Infios

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇧🇷 Brazil

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSAzureCloudGoogle Cloud PlatformKubernetesPythonTerraform

About the role

  • Operate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments.
  • Manage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades.
  • Ensure system availability, scalability, and performance through proactive monitoring and optimization.
  • Maintain infrastructure-as-code (IaC) for consistent and repeatable deployments.
  • Identify opportunities for operational automation to eliminate manual processes (“reduce toil”).
  • Build and maintain automated pipelines for deployments, configuration, and remediation.
  • Develop self-healing mechanisms to automatically detect and resolve common service issues.
  • Design proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog).
  • Collaborate with DevOps and development teams to build reliable, observable, and resilient systems.
  • Monitor, troubleshoot, and resolve infrastructure and application issues.

Requirements

  • Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).
  • 5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles.
  • Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP).
  • Strong knowledge of Kubernetes deployment, management, and troubleshooting.
  • Solid understanding of observability and monitoring (e.g., Dynatrace, DataDog) and incident management platforms.
  • Proficiency in scripting and automation (e.g., Python, Bash, Terraform, Ansible).
  • Strong troubleshooting and analytical skills across infrastructure and applications.
  • Experience with incident response, RCA, and postmortem processes.
  • A mindset of continuous improvement, reliability, and self-healing automation.
  • Understanding of SRE principles, SLAs/SLOs/SLIs, and chaos engineering practices.
Benefits
  • Competitive salary
  • Flexible working hours
  • Professional development budget
  • Home office setup allowance
  • Global team events

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
cloud infrastructureKubernetesinfrastructure-as-codeautomationscriptingPythonBashTerraformAnsibleobservability
Soft skills
troubleshootinganalytical skillscontinuous improvementcollaborationreliabilityself-healing automation