Zapier

Senior Program Manager, Incident Management

Zapier

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $174,200 - $261,300 per year

Job Level

Tech Stack

About the role

  • Own the incident program. Lead the design, evolution, and governance of incident processes across the Build organization both response and post-incident processes.
  • Ensure workflows are consistent, auditable, and aligned with enterprise expectations.
  • You are the DRI for incident management as a program.
  • Design and ship repeatable AI tools: automated incident summarization, intelligent severity classification, AI-assisted root cause analysis, postmortem draft generation, and more.
  • Turn one-off AI experiments into durable workflows that compound over time.
  • Create clarity in ambiguity, align stakeholders, and drive decisions across teams and zones.
  • Serve as the point of contact for questions related to incident process, expectations, and best practices.
  • Identify recurring org friction, drive root-cause solutions, and implement fixes that persist beyond individual incidents.
  • Build, maintain, and refine dashboards and reports using Databricks, Looker, and related tools.
  • Translate data into actionable insight: identify trends, risks, weak signals, and hotspots.
  • Communicate findings to the right audiences.
  • Instill rigor and accountability. Coach responders and incident roles (Incident Commander, Support Leads, and new roles as they emerge).
  • Produce and maintain clear documentation (playbooks, templates, guides) and deliver training for all incident roles and stakeholder groups.
  • Collaborate with engineering leads, EMs, product, support, security, GTM, and leadership to strengthen practices.
  • Step into incident response roles during business hours as appropriate to experience the work firsthand and inform program improvements.

Requirements

  • You have deep incident management experience and you've moved beyond just executing it.
  • You've built and led incident response programs, post-incident processes, SRE practices, or reliability-focused work.
  • You've ideally done 0-to-1 work in this space: stood up programs, defined standards, trained responders.
  • You've created repeatable systems (workflows, agents, copilots, or automation) that fundamentally changed how work gets done.
  • You use AI-native tools (Cursor, Claude Code, or similar) as your default and orchestrate them into durable capabilities that compound over time.
  • You can quantify the impact on velocity, quality, or organizational capacity.
  • You iterate, refine, and critically evaluate AI outputs, embedding quality standards and accountability into the systems you build, not just the outputs.
  • You have deep expertise in incident management, but you're not rigidly attached to how you've done it before.
  • You can stretch into adjacent areas (reliability strategy, enterprise readiness, operational tooling) as the role evolves.
  • You instinctively look for root causes and design solutions that scale beyond your immediate program.
  • You understand how the full incident lifecycle (prevention, detection, response, learning) supports customer trust and enterprise readiness.
  • You shape outcomes by building trust.
  • You know how to build coalitions across engineering, support, security, GTM, and leadership.
  • You lead change and not just implement it, you make it stick.
  • You can go toe-to-toe with engineers, support leads, and product leaders to clarify the "why" behind technical tradeoffs and incident decisions.
  • You act decisively even in high ambiguity. When priorities collide, you clarify, decide, and help the org move forward.
  • You communicate with relentless clarity: context and intent early, often, and candidly especially when it's uncomfortable.
  • You can work directly with data tools (e.g., Databricks, SQL) to build rich reporting and meaningful insights.
  • You understand incident tooling (incident.io or similar) and how it integrates with Slack, PagerDuty, and on-call workflows.
  • You work well remotely.
  • You need to be authorized to work in the country due to nationality or valid work permit.
Benefits
  • Offers Equity
  • Offers Bonus
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
incident managementAI toolsautomated incident summarizationintelligent severity classificationroot cause analysisdata analysisreportingworkflow automationSQLSRE practices
Soft Skills
leadershipcommunicationdecision-makingcollaborationproblem-solvingcoachingtrust-buildingadaptabilityclarity in ambiguitychange management