Coterie

Site Reliability Engineer

Coterie

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $120,000 - $155,000 per year

About the role

  • Manage and maintain cloud infrastructure on Azure, including Azure Kubernetes Service (AKS) clusters and supporting resources
  • Build, improve, and maintain CI/CD pipelines using GitHub Actions to support reliable and repeatable deployments
  • Own and enhance our Grafana implementation; designing dashboards, configuring alerts, and supporting incident management workflows
  • Monitor system health, triage incidents, and drive root cause analysis to prevent recurrence
  • Collaborate with development teams to define and track SLIs, SLOs, and error budgets that align with business goals
  • Contribute to infrastructure-as-code practices using Pulumi
  • Identify and resolve reliability risks through capacity planning, performance tuning, and proactive system improvements
  • Participate in an on-call rotation to support production systems and respond to incidents
  • Document runbooks, operational procedures, and architectural decisions to support team knowledge sharing

Requirements

  • 3+ years of experience in a Site Reliability Engineering, DevOps, or Infrastructure role
  • Strong hands-on experience with: Azure Cloud services and resource management
  • Kubernetes and AKS administration, including deployments, networking, and troubleshooting
  • GitHub Actions for CI/CD pipeline development and maintenance
  • Solid experience with Grafana, including dashboard creation, alerting configuration, and incident management
  • Hands-on experience with Prometheus, Loki, or other observability tools in the Grafana ecosystem
  • Proficiency in at least one scripting or programming language such as Python or Bash
  • Understanding of networking fundamentals, DNS, load balancing, and container orchestration concepts
  • Strong analytical and communication skills; able to diagnose complex system issues and clearly communicate findings
  • Demonstrated ability to collaborate across teams and contribute to a culture of reliability
  • Experience working in an agile environment with modern DevOps practices
Benefits
  • 100% remote
  • Health insurance through Aetna (we pay 100% of premiums)
  • Dental and vision insurance through Guardian (we pay 100% of premiums)
  • Basic life insurance (we pay 100% of premiums)
  • Access to flexible spending account (FSA) or health savings account (HSA) (for those using HSA eligible plans)
  • 401K plan (up 4% match with immediate vest)
  • Flexible PTO policy offering up to 3 weeks of time off during first twelve months of employment
  • After the first year of employment transitions to up to 4 to 5 weeks of time off annually
  • 12 company-paid holidays each year
  • Continuing education annual stipend
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Azure Cloud servicesKubernetesAzure Kubernetes Service (AKS)CI/CD pipelinesGitHub ActionsGrafanaPrometheusLokiscriptingPython
Soft Skills
analytical skillscommunication skillscollaborationproblem-solvingincident management