
Site Reliability Engineer
Coterie
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $120,000 - $155,000 per year
About the role
- Manage and maintain cloud infrastructure on Azure, including Azure Kubernetes Service (AKS) clusters and supporting resources
- Build, improve, and maintain CI/CD pipelines using GitHub Actions to support reliable and repeatable deployments
- Own and enhance our Grafana implementation; designing dashboards, configuring alerts, and supporting incident management workflows
- Monitor system health, triage incidents, and drive root cause analysis to prevent recurrence
- Collaborate with development teams to define and track SLIs, SLOs, and error budgets that align with business goals
- Contribute to infrastructure-as-code practices using Pulumi
- Identify and resolve reliability risks through capacity planning, performance tuning, and proactive system improvements
- Participate in an on-call rotation to support production systems and respond to incidents
- Document runbooks, operational procedures, and architectural decisions to support team knowledge sharing
Requirements
- 3+ years of experience in a Site Reliability Engineering, DevOps, or Infrastructure role
- Strong hands-on experience with: Azure Cloud services and resource management
- Kubernetes and AKS administration, including deployments, networking, and troubleshooting
- GitHub Actions for CI/CD pipeline development and maintenance
- Solid experience with Grafana, including dashboard creation, alerting configuration, and incident management
- Hands-on experience with Prometheus, Loki, or other observability tools in the Grafana ecosystem
- Proficiency in at least one scripting or programming language such as Python or Bash
- Understanding of networking fundamentals, DNS, load balancing, and container orchestration concepts
- Strong analytical and communication skills; able to diagnose complex system issues and clearly communicate findings
- Demonstrated ability to collaborate across teams and contribute to a culture of reliability
- Experience working in an agile environment with modern DevOps practices
Benefits
- 100% remote
- Health insurance through Aetna (we pay 100% of premiums)
- Dental and vision insurance through Guardian (we pay 100% of premiums)
- Basic life insurance (we pay 100% of premiums)
- Access to flexible spending account (FSA) or health savings account (HSA) (for those using HSA eligible plans)
- 401K plan (up 4% match with immediate vest)
- Flexible PTO policy offering up to 3 weeks of time off during first twelve months of employment
- After the first year of employment transitions to up to 4 to 5 weeks of time off annually
- 12 company-paid holidays each year
- Continuing education annual stipend
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Azure Cloud servicesKubernetesAzure Kubernetes Service (AKS)CI/CD pipelinesGitHub ActionsGrafanaPrometheusLokiscriptingPython
Soft Skills
analytical skillscommunication skillscollaborationproblem-solvingincident management