Backblaze

Strategic Ops Engineer III

Backblaze

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $123,000 - $175,000 per year

About the role

  • Available to Lead and govern the end-to-end incident management lifecycle, including detection, triage, escalation, and resolution.
  • Drive major incident management (MIM) processes and communications.
  • Improve MTTR (Mean Time to Resolution) through automation and process optimization.
  • Establish and maintain incident response playbooks and runbooks.
  • Maintain and improve intelligent heatmaps leveraging AI/ML to identify recurring technical themes and prioritize long-term remediation.
  • Implement trend analysis and proactive problem identification using observability data and AI.
  • Track and manage problem records to closure.
  • Govern change management processes (lead the CAB), ensuring safe, compliant, and low-risk deployments.
  • Define and enforce change policies, risk assessments, and approval workflows.
  • Drive continuous improvement in release and deployment practices.
  • Maintain a strong understanding of system architecture and monitoring strategies, identifying gaps and opportunities for improvement.
  • Partner with engineering teams to improve system resilience and performance.
  • Reduce alert fatigue by improving signal-to-noise ratio in monitoring systems.
  • Leverage AI/ML for anomaly detection, predictive alerting, and automated root cause analysis.
  • Implement AI-driven solutions to optimize incident response and operational workflows.
  • Analyze large-scale operational data to identify patterns and recommend improvements.

Requirements

  • 5+ years of experience in IT Operations, SRE, or similar roles.
  • Strong expertise in Incident, Problem, and Change Management (ITIL or similar frameworks).
  • Proven experience in governing and optimizing operational processes.
  • AI & Data Expertise: Strong knowledge of AI/ML concepts, including anomaly detection, predictive analytics, and data modeling.
  • AIOps Experience: Hands-on experience with AIOps platforms or building AI-driven operational solutions (event correlation, alert prioritization).
Benefits
  • Health insurance
  • Paid time off
  • Professional development opportunities
  • Remote work options
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
incident managementproblem managementchange managementAImachine learningautomationprocess optimizationtrend analysisdata modelingobservability
Soft Skills
leadershipcommunicationcontinuous improvementcollaborationproblem-solving
Certifications
ITIL