Site Reliability Engineer

Crunchafi

full-time

Posted on: 2/25/2026

Location Type: Remote

Location: Wisconsin • United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Azure Cloud DNS Docker Go Kubernetes Python SQL Terraform

About the role

Design, build, and maintain scalable and resilient infrastructure on Microsoft Azure to support production SaaS workloads
Define and track service level objectives (SLOs), service level indicators (SLIs), and error budgets to drive reliability decisions
Build and maintain comprehensive monitoring, alerting, and observability systems to ensure early detection of issues
Develop and maintain CI/CD pipelines using GitHub Actions to enable safe, rapid, and repeatable deployments
Lead incident response and on-call rotations, conduct blameless post-incident reviews, and drive follow-up action items to completion
Automate operational tasks and eliminate toil through scripting, infrastructure-as-code, and self-healing systems
Manage and optimize Azure Kubernetes Service (AKS) clusters, container orchestration, and related networking and storage configurations
Collaborate with software engineering teams to embed reliability into application architecture, including capacity planning, load testing, and chaos engineering
Maintain and improve infrastructure-as-code using tools such as Terraform, Bicep, or ARM templates
Partner cross-functionally with Product, Support, and Quality to reduce friction and accelerate delivery

Requirements

5+ years of professional experience in site reliability engineering, DevOps, or infrastructure engineering roles
Strong hands-on experience with Microsoft Azure cloud services (AKS, Azure SQL, App Services, Virtual Networks, Azure Monitor, etc.)
Proficiency in at least one programming or scripting language (Python, Go, Bash, PowerShell, or C#)
Experience designing and managing CI/CD pipelines using GitHub Actions, Azure DevOps, or equivalent
Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes)
Demonstrated experience with infrastructure-as-code tools (e.g. Bicep + ARM templates)
Strong understanding of networking fundamentals, DNS, load balancing, and TLS/SSL management
Experience with monitoring and observability platforms (Azure Monitor, Alerts, App Insights, Seq, etc.)
Proven track record of managing production incidents, conducting post-mortems, and driving reliability improvements
Exceptional analytical, interpersonal, and communication skills

Benefits

Competitive salary
Health, dental, and vision plans
401(k) Retirement savings plan for US-based employees
100% remote work environment, with occasional travel for in-person company and/or team meetings
Unlimited PTO
Significant professional development growth opportunities
Dynamic and inclusive company culture with real commitment to our values

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

site reliability engineeringDevOpsinfrastructure engineeringMicrosoft AzureCI/CD pipelinesprogramming languagesscripting languagesinfrastructure-as-codecontainerizationorchestration technologies

Soft Skills

analytical skillsinterpersonal skillscommunication skillsleadershipcollaborationproblem-solvingincident managementpost-mortem analysisreliability improvementscapacity planning