SRE, AI Automation Engineer

EdgeUno

Join EdgeUno as an SRE & AI Automation Engineer, focusing on cloud infrastructure in Latin America. Work on Kubernetes, observability, and AI-powered automation workflows across a multinational team in a hybrid setup.

Posted 5/23/2026full-timeUberlândia • 🇧🇷 BrazilMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AnsibleCloudFluxGrafanaKubernetesPrometheusPythonTerraformTypeScript

About the role

Key responsibilities & impact

Define and implement SLOs, SLIs, and reliability practices across cloud services
Build and maintain observability environments using Prometheus, Grafana, Alertmanager, Loki, and related tooling
Reduce operational toil through automation and infrastructure engineering initiatives
Support incident management processes, post-mortems, runbooks, and operational workflows
Collaborate on Kubernetes operations, cluster lifecycle management, and infrastructure scalability
Implement GitOps workflows using tools such as ArgoCD, Flux, and Infrastructure-as-Code frameworks
Design and develop AI-powered operational tools and internal assistants
Build automation workflows integrating cloud APIs, ticketing systems, Slack, dashboards, and operational platforms
Integrate LLMs and AI services into internal workflows using APIs and RAG architectures
Develop AI-driven reporting, incident summarization, and operational intelligence solutions
Evaluate and prototype agentic AI frameworks and automation platforms
Develop Infrastructure-as-Code environments using Terraform, Ansible, and related technologies
Build CI/CD pipelines and infrastructure validation workflows
Automate provisioning, upgrades, monitoring, and infrastructure operations across distributed environments
Improve deployment reliability and operational visibility across cloud services
Help establish SRE best practices across engineering teams
Collaborate with infrastructure, support, operations, and leadership teams to identify automation opportunities
Maintain clear technical documentation for systems, workflows, and operational processes
Support tooling evaluation and technical decision-making related to cloud infrastructure and AI operations

Requirements

What you’ll need

English B2+
5+ years of experience in SRE, DevOps, Platform Engineering, or related infrastructure roles
Strong experience with observability and monitoring stacks such as Prometheus, Grafana, Alertmanager, Loki, or equivalent
Hands-on experience building or integrating AI/LLM-powered applications, tools, or workflows
Strong proficiency in Python and/or TypeScript
Experience operating Kubernetes environments in production
Experience with Infrastructure-as-Code and automation tooling such as Terraform, Ansible, ArgoCD, or similar
Strong understanding of SLOs, SLIs, reliability engineering, and operational best practices

Benefits

Comp & perks

Opportunity to work on strategic cloud and AI infrastructure initiatives across Latin America
Direct exposure to modern cloud-native, Kubernetes, and AI-driven operational environments
Close collaboration with Cloud Engineering leadership and product strategy initiatives
Multinational and multicultural team environment across LATAM

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

SLOsSLIsreliability engineeringPythonTypeScriptKubernetesInfrastructure-as-CodeTerraformAnsibleCI/CD

Soft Skills

collaborationincident managementtechnical documentationautomationproblem-solving