Site Reliability Engineer

Datavail

full-time

Posted on: 1/29/2026

Location Type: Remote

Location: Colombia

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Ansible AWS Azure Cloud Docker Go Google Cloud Platform Grafana Jenkins Kubernetes Oracle Prometheus Python Terraform

About the role

Implement and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, ELK, OpenTelemetry)
Build and maintain CI/CD pipelines and automation for deployments and testing
Support containerized workloads using Docker and Kubernetes; manage Helm charts and deployments
Contribute to incident response, troubleshooting, and postmortem documentation
Implement IaC patterns (Terraform, CloudFormation, ARM templates) under guidance
Collaborate with developers to improve service reliability and operational readiness
Participate in continuous platform improvements led by senior/principal engineers

Requirements

3–5 years of experience in operations, DevOps, or SRE roles
Hands-on experience with containers and orchestration (Docker, Kubernetes)
Familiarity with IaC tools (Terraform, Ansible, or similar)
Experience with CI/CD tools (Jenkins, GitHub Actions, ArgoCD, or similar)
Proficiency in at least one scripting language (Python, Bash, Go)
Associate Level Cloud Certification (AWS, Azure, GCP, Oracle, Cloud+)
This position requires availability for weekend and holiday shifts as part of the standard scheduling rotation

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

monitoring systemsalerting systemslogging systemsCI/CD pipelinesautomationcontainerizationorchestrationInfrastructure as Codescriptingincident response

Soft Skills

collaborationtroubleshootingdocumentationservice reliabilityoperational readiness

Certifications

Associate Level Cloud CertificationAWSAzureGCPOracleCloud+