
Site Reliability Engineer
Datavail
full-time
Posted on:
Location Type: Remote
Location: Colombia
Visit company websiteExplore more
Tech Stack
About the role
- Implement and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, ELK, OpenTelemetry)
- Build and maintain CI/CD pipelines and automation for deployments and testing
- Support containerized workloads using Docker and Kubernetes; manage Helm charts and deployments
- Contribute to incident response, troubleshooting, and postmortem documentation
- Implement IaC patterns (Terraform, CloudFormation, ARM templates) under guidance
- Collaborate with developers to improve service reliability and operational readiness
- Participate in continuous platform improvements led by senior/principal engineers
Requirements
- 3–5 years of experience in operations, DevOps, or SRE roles
- Hands-on experience with containers and orchestration (Docker, Kubernetes)
- Familiarity with IaC tools (Terraform, Ansible, or similar)
- Experience with CI/CD tools (Jenkins, GitHub Actions, ArgoCD, or similar)
- Proficiency in at least one scripting language (Python, Bash, Go)
- Associate Level Cloud Certification (AWS, Azure, GCP, Oracle, Cloud+)
- This position requires availability for weekend and holiday shifts as part of the standard scheduling rotation
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
monitoring systemsalerting systemslogging systemsCI/CD pipelinesautomationcontainerizationorchestrationInfrastructure as Codescriptingincident response
Soft Skills
collaborationtroubleshootingdocumentationservice reliabilityoperational readiness
Certifications
Associate Level Cloud CertificationAWSAzureGCPOracleCloud+