Crunchafi

Site Reliability Engineer

Crunchafi

full-time

Posted on:

Location Type: Remote

Location: WisconsinUnited States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design, build, and maintain scalable and resilient infrastructure on Microsoft Azure to support production SaaS workloads
  • Define and track service level objectives (SLOs), service level indicators (SLIs), and error budgets to drive reliability decisions
  • Build and maintain comprehensive monitoring, alerting, and observability systems to ensure early detection of issues
  • Develop and maintain CI/CD pipelines using GitHub Actions to enable safe, rapid, and repeatable deployments
  • Lead incident response and on-call rotations, conduct blameless post-incident reviews, and drive follow-up action items to completion
  • Automate operational tasks and eliminate toil through scripting, infrastructure-as-code, and self-healing systems
  • Manage and optimize Azure Kubernetes Service (AKS) clusters, container orchestration, and related networking and storage configurations
  • Collaborate with software engineering teams to embed reliability into application architecture, including capacity planning, load testing, and chaos engineering
  • Maintain and improve infrastructure-as-code using tools such as Terraform, Bicep, or ARM templates
  • Partner cross-functionally with Product, Support, and Quality to reduce friction and accelerate delivery

Requirements

  • 5+ years of professional experience in site reliability engineering, DevOps, or infrastructure engineering roles
  • Strong hands-on experience with Microsoft Azure cloud services (AKS, Azure SQL, App Services, Virtual Networks, Azure Monitor, etc.)
  • Proficiency in at least one programming or scripting language (Python, Go, Bash, PowerShell, or C#)
  • Experience designing and managing CI/CD pipelines using GitHub Actions, Azure DevOps, or equivalent
  • Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes)
  • Demonstrated experience with infrastructure-as-code tools (e.g. Bicep + ARM templates)
  • Strong understanding of networking fundamentals, DNS, load balancing, and TLS/SSL management
  • Experience with monitoring and observability platforms (Azure Monitor, Alerts, App Insights, Seq, etc.)
  • Proven track record of managing production incidents, conducting post-mortems, and driving reliability improvements
  • Exceptional analytical, interpersonal, and communication skills
Benefits
  • Competitive salary
  • Health, dental, and vision plans
  • 401(k) Retirement savings plan for US-based employees
  • 100% remote work environment, with occasional travel for in-person company and/or team meetings
  • Unlimited PTO
  • Significant professional development growth opportunities
  • Dynamic and inclusive company culture with real commitment to our values
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
site reliability engineeringDevOpsinfrastructure engineeringMicrosoft AzureCI/CD pipelinesprogramming languagesscripting languagesinfrastructure-as-codecontainerizationorchestration technologies
Soft Skills
analytical skillsinterpersonal skillscommunication skillsleadershipcollaborationproblem-solvingincident managementpost-mortem analysisreliability improvementscapacity planning