
Staff Site Reliability Engineer – Incident Management & Reliability
Confluent
full-time
Posted on:
Location Type: Remote
Location: Canada
Visit company websiteExplore more
Salary
💰 CA$225,100 - CA$264,500 per year
Job Level
About the role
- Analyze systemic failure patterns and design reliability improvements that prevent incident recurrence
- Own Rootly configuration, workflows, and integrations with PagerDuty, Jira, Confluence, and Slack
- Define and maintain SLO/SLA frameworks; use error budgets to guide reliability investments
- Own standards, practices, and continuous improvement of incident response across engineering
- Edit and review customer-facing incident documents (CRCAs) to ensure quality and clarity
- Develop and deliver training programs; coach teams through post-mortems
- Partner with engineering leaders to elevate reliability practices org-wide
Requirements
- 10+ years of relevant experience in SRE, incident management, or reliability engineering
- Cloud experience with at least one of AWS, GCP, or Azure (we run all three)
- Experience navigating reliability/incident programs at 500+ engineer organizations
- Deep expertise with incident management tooling (Rootly, PagerDuty, or similar)
- Strong understanding of distributed systems and failure modes at scale
- Deep experience with observability: metrics, logging, tracing
- Kubernetes and container orchestration experience
- Understanding of CI/CD pipelines and release processes
- Strong written communication (design docs, runbooks, post-mortems)
- Experience driving org-wide process and cultural changes
- Kafka/event streaming expertise preferred, or demonstrated rapid mastery of complex systems
Benefits
- Offers Equity 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREincident managementreliability engineeringcloud computingAWSGCPAzureKubernetesCI/CDobservability
Soft Skills
strong written communicationcoachingcontinuous improvementprocess changecollaboration