
Strategic Ops Engineer III
Backblaze
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $123,000 - $175,000 per year
About the role
- Available to Lead and govern the end-to-end incident management lifecycle, including detection, triage, escalation, and resolution.
- Drive major incident management (MIM) processes and communications.
- Improve MTTR (Mean Time to Resolution) through automation and process optimization.
- Establish and maintain incident response playbooks and runbooks.
- Maintain and improve intelligent heatmaps leveraging AI/ML to identify recurring technical themes and prioritize long-term remediation.
- Implement trend analysis and proactive problem identification using observability data and AI.
- Track and manage problem records to closure.
- Govern change management processes (lead the CAB), ensuring safe, compliant, and low-risk deployments.
- Define and enforce change policies, risk assessments, and approval workflows.
- Drive continuous improvement in release and deployment practices.
- Maintain a strong understanding of system architecture and monitoring strategies, identifying gaps and opportunities for improvement.
- Partner with engineering teams to improve system resilience and performance.
- Reduce alert fatigue by improving signal-to-noise ratio in monitoring systems.
- Leverage AI/ML for anomaly detection, predictive alerting, and automated root cause analysis.
- Implement AI-driven solutions to optimize incident response and operational workflows.
- Analyze large-scale operational data to identify patterns and recommend improvements.
Requirements
- 5+ years of experience in IT Operations, SRE, or similar roles.
- Strong expertise in Incident, Problem, and Change Management (ITIL or similar frameworks).
- Proven experience in governing and optimizing operational processes.
- AI & Data Expertise: Strong knowledge of AI/ML concepts, including anomaly detection, predictive analytics, and data modeling.
- AIOps Experience: Hands-on experience with AIOps platforms or building AI-driven operational solutions (event correlation, alert prioritization).
Benefits
- Health insurance
- Paid time off
- Professional development opportunities
- Remote work options
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
incident managementproblem managementchange managementAImachine learningautomationprocess optimizationtrend analysisdata modelingobservability
Soft Skills
leadershipcommunicationcontinuous improvementcollaborationproblem-solving
Certifications
ITIL