Senior Program Manager, Incident Management

Zapier

. Own the incident program. Lead the design, evolution, and governance of incident processes across the Build organization both response and post-incident processes. . Ensure workflows are consistent, auditable, and al

Posted 4/9/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $174,200 - $261,300 per yearWebsite

Tech Stack

Tools & technologies

SQL

About the role

Key responsibilities & impact

Own the incident program. Lead the design, evolution, and governance of incident processes across the Build organization both response and post-incident processes.
Ensure workflows are consistent, auditable, and aligned with enterprise expectations.
You are the DRI for incident management as a program.
Design and ship repeatable AI tools: automated incident summarization, intelligent severity classification, AI-assisted root cause analysis, postmortem draft generation, and more.
Turn one-off AI experiments into durable workflows that compound over time.
Create clarity in ambiguity, align stakeholders, and drive decisions across teams and zones.
Serve as the point of contact for questions related to incident process, expectations, and best practices.
Identify recurring org friction, drive root-cause solutions, and implement fixes that persist beyond individual incidents.
Build, maintain, and refine dashboards and reports using Databricks, Looker, and related tools.
Translate data into actionable insight: identify trends, risks, weak signals, and hotspots.
Communicate findings to the right audiences.
Instill rigor and accountability. Coach responders and incident roles (Incident Commander, Support Leads, and new roles as they emerge).
Produce and maintain clear documentation (playbooks, templates, guides) and deliver training for all incident roles and stakeholder groups.
Collaborate with engineering leads, EMs, product, support, security, GTM, and leadership to strengthen practices.
Step into incident response roles during business hours as appropriate to experience the work firsthand and inform program improvements.

Requirements

What you’ll need

You have deep incident management experience and you've moved beyond just executing it.
You've built and led incident response programs, post-incident processes, SRE practices, or reliability-focused work.
You've ideally done 0-to-1 work in this space: stood up programs, defined standards, trained responders.
You've created repeatable systems (workflows, agents, copilots, or automation) that fundamentally changed how work gets done.
You use AI-native tools (Cursor, Claude Code, or similar) as your default and orchestrate them into durable capabilities that compound over time.
You can quantify the impact on velocity, quality, or organizational capacity.
You iterate, refine, and critically evaluate AI outputs, embedding quality standards and accountability into the systems you build, not just the outputs.
You have deep expertise in incident management, but you're not rigidly attached to how you've done it before.
You can stretch into adjacent areas (reliability strategy, enterprise readiness, operational tooling) as the role evolves.
You instinctively look for root causes and design solutions that scale beyond your immediate program.
You understand how the full incident lifecycle (prevention, detection, response, learning) supports customer trust and enterprise readiness.
You shape outcomes by building trust.
You know how to build coalitions across engineering, support, security, GTM, and leadership.
You lead change and not just implement it, you make it stick.
You can go toe-to-toe with engineers, support leads, and product leaders to clarify the "why" behind technical tradeoffs and incident decisions.
You act decisively even in high ambiguity. When priorities collide, you clarify, decide, and help the org move forward.
You communicate with relentless clarity: context and intent early, often, and candidly especially when it's uncomfortable.
You can work directly with data tools (e.g., Databricks, SQL) to build rich reporting and meaningful insights.
You understand incident tooling (incident.io or similar) and how it integrates with Slack, PagerDuty, and on-call workflows.
You work well remotely.
You need to be authorized to work in the country due to nationality or valid work permit.

Benefits

Comp & perks

Offers Equity
Offers Bonus

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

incident managementAI toolsautomated incident summarizationintelligent severity classificationroot cause analysisdata analysisreportingworkflow automationSQLSRE practices

Soft Skills

leadershipcommunicationdecision-makingcollaborationproblem-solvingcoachingtrust-buildingadaptabilityclarity in ambiguitychange management