FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
CoterieSenior Site Reliability Engineer at Coterie Insurance, responsible for managing Azure infrastructure and enhancing CI/CD processes. Join a mission-driven team focusing on small business insurance solutions.
Tech Stack
Tools & technologiesAzureCloudDNSGrafanaKubernetesPrometheusPython
About the role
Key responsibilities & impact- Manage and maintain cloud infrastructure on Azure, including Azure Kubernetes Service (AKS) clusters and supporting resources
- Build, improve, and maintain CI/CD pipelines using GitHub Actions to support reliable and repeatable deployments
- Own and enhance our Grafana implementation; designing dashboards, configuring alerts, and supporting incident management workflows
- Monitor system health, triage incidents, and drive root cause analysis to prevent recurrence
- Collaborate with development teams to define and track SLIs, SLOs, and error budgets that align with business goals
- Contribute to infrastructure-as-code practices using Pulumi
- Identify and resolve reliability risks through capacity planning, performance tuning, and proactive system improvements
- Participate in an on-call rotation to support production systems and respond to incidents
- Document runbooks, operational procedures, and architectural decisions to support team knowledge sharing
Requirements
What you’ll need- 5+ years of experience in a Site Reliability Engineering, DevOps, or Infrastructure role
- 3+ years experience working with infrastructure as code
- 2+ years of experience architecting CI/CD pipelines and cloud-based infrastructure
- Strong hands-on experience with: Azure Cloud services and resource management
- Kubernetes and AKS administration, including deployments, networking, and troubleshooting
- GitHub Actions for CI/CD pipeline development and maintenance
- 3+ experience with Grafana or similar tooling, including dashboard creation, alerting configuration, and incident management
- Hands-on experience with Prometheus, Loki, or other observability tools in the Grafana ecosystem
- Proficiency in at least one scripting or programming language such as Python or Bash
- Understanding of networking fundamentals, DNS, load balancing, and container orchestration concepts
- Strong analytical and communication skills; able to diagnose complex system issues and clearly communicate findings
- Demonstrated ability to collaborate across teams and contribute to a culture of reliability
- Experience working in an agile environment with modern DevOps practices
Benefits
Comp & perks- 100% remote
- Health insurance through Aetna (we pay 100% of premiums)
- Dental and vision insurance through Guardian (we pay 100% of premiums)
- Basic life insurance (we pay 100% of premiums)
- Access to flexible spending account (FSA) or health savings account (HSA) (for those using HSA eligible plans)
- 401K plan (up 4% match with immediate vest).
- Must be 21 years of age or older to participate
- Flexible PTO policy offering employees up to 4 weeks of PTO in their first 12 months. Thereafter, PTO usage aligns with company standards and typically does not exceed 5 weeks per calendar year.
- 12 company-paid holidays each year
- Continuing education annual stipend
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AzureKubernetesAKSCI/CD pipelinesGitHub ActionsGrafanaPrometheusLokiinfrastructure as codescripting
Soft Skills
analytical skillscommunication skillscollaborationincident managementproblem-solving