Datavail

Site Reliability Engineer

Datavail

full-time

Posted on:

Location Type: Remote

Location: Colombia

Visit company website

Explore more

AI Apply
Apply

About the role

  • Implement and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, ELK, OpenTelemetry)
  • Build and maintain CI/CD pipelines and automation for deployments and testing
  • Support containerized workloads using Docker and Kubernetes; manage Helm charts and deployments
  • Contribute to incident response, troubleshooting, and postmortem documentation
  • Implement IaC patterns (Terraform, CloudFormation, ARM templates) under guidance
  • Collaborate with developers to improve service reliability and operational readiness
  • Participate in continuous platform improvements led by senior/principal engineers

Requirements

  • 3–5 years of experience in operations, DevOps, or SRE roles
  • Hands-on experience with containers and orchestration (Docker, Kubernetes)
  • Familiarity with IaC tools (Terraform, Ansible, or similar)
  • Experience with CI/CD tools (Jenkins, GitHub Actions, ArgoCD, or similar)
  • Proficiency in at least one scripting language (Python, Bash, Go)
  • Associate Level Cloud Certification (AWS, Azure, GCP, Oracle, Cloud+)
  • This position requires availability for weekend and holiday shifts as part of the standard scheduling rotation
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
monitoring systemsalerting systemslogging systemsCI/CD pipelinesautomationcontainerizationorchestrationInfrastructure as Codescriptingincident response
Soft Skills
collaborationtroubleshootingdocumentationservice reliabilityoperational readiness
Certifications
Associate Level Cloud CertificationAWSAzureGCPOracleCloud+