Site Reliability Engineer, SRE

Denvr

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇨🇦 Canada

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSCloudDNSGoGrafanaKubernetesLinuxPrometheusPythonShell ScriptingTCP/IPTerraform

About the role

  • Design, implement, and maintain observability systems with Grafana, Prometheus, Victoria metrics and PromQL to monitor system health and performance.
  • Explore opportunities of improving overall observability of HPC environment using industry best practices.
  • Participate in on-call rotations, rapidly diagnose and resolve incidents, and perform postmortem reviews to drive continuous improvements.
  • Hands-on experience in automating DevOps pipeline using GitHub Action (or similar tools).

Requirements

  • 3-5 years in a Site Reliability Engineering (SRE) or DevOps role.
  • Strong software development background, Computer science fundamentals.
  • Familiarity with tools like Terraform or Helm, Ansible, Python for automated infrastructure provisioning.
  • Knowledge of security practices and compliance standards for enterprise environments.
  • Familiarity with high-performance computing, specifically in administering GPU-related workloads.
  • Strong experience in managing Kubernetes clusters in production environments.
  • Expertise observability platforms (Grafana, Prometheus, PromQL) for tracking and analyzing system metrics.
  • Solid understanding of networking fundamentals (TCP/IP, DNS, load balancing, VPNs).
  • Hands on experience on developing and deploying production grade applications in AWS Cloud under hybrid cloud architecture.
  • Proficiency in Linux administration, shell scripting, and performance tuning.
  • Strong software development skills (e.g., Bash, Python, Golang) to automate infrastructure and operational tasks.
Benefits
  • Competitive salary
  • Flexible working hours
  • Professional development opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
GrafanaPrometheusVictoria metricsPromQLGitHub ActionTerraformHelmAnsiblePythonKubernetes
Soft skills
problem-solvingincident resolutioncontinuous improvementcollaboration