
Senior Program Manager, Incident Management
Zapier
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $174,200 - $261,300 per year
Job Level
Tech Stack
About the role
- Own the incident program. Lead the design, evolution, and governance of incident processes across the Build organization both response and post-incident processes.
- Ensure workflows are consistent, auditable, and aligned with enterprise expectations.
- You are the DRI for incident management as a program.
- Design and ship repeatable AI tools: automated incident summarization, intelligent severity classification, AI-assisted root cause analysis, postmortem draft generation, and more.
- Turn one-off AI experiments into durable workflows that compound over time.
- Create clarity in ambiguity, align stakeholders, and drive decisions across teams and zones.
- Serve as the point of contact for questions related to incident process, expectations, and best practices.
- Identify recurring org friction, drive root-cause solutions, and implement fixes that persist beyond individual incidents.
- Build, maintain, and refine dashboards and reports using Databricks, Looker, and related tools.
- Translate data into actionable insight: identify trends, risks, weak signals, and hotspots.
- Communicate findings to the right audiences.
- Instill rigor and accountability. Coach responders and incident roles (Incident Commander, Support Leads, and new roles as they emerge).
- Produce and maintain clear documentation (playbooks, templates, guides) and deliver training for all incident roles and stakeholder groups.
- Collaborate with engineering leads, EMs, product, support, security, GTM, and leadership to strengthen practices.
- Step into incident response roles during business hours as appropriate to experience the work firsthand and inform program improvements.
Requirements
- You have deep incident management experience and you've moved beyond just executing it.
- You've built and led incident response programs, post-incident processes, SRE practices, or reliability-focused work.
- You've ideally done 0-to-1 work in this space: stood up programs, defined standards, trained responders.
- You've created repeatable systems (workflows, agents, copilots, or automation) that fundamentally changed how work gets done.
- You use AI-native tools (Cursor, Claude Code, or similar) as your default and orchestrate them into durable capabilities that compound over time.
- You can quantify the impact on velocity, quality, or organizational capacity.
- You iterate, refine, and critically evaluate AI outputs, embedding quality standards and accountability into the systems you build, not just the outputs.
- You have deep expertise in incident management, but you're not rigidly attached to how you've done it before.
- You can stretch into adjacent areas (reliability strategy, enterprise readiness, operational tooling) as the role evolves.
- You instinctively look for root causes and design solutions that scale beyond your immediate program.
- You understand how the full incident lifecycle (prevention, detection, response, learning) supports customer trust and enterprise readiness.
- You shape outcomes by building trust.
- You know how to build coalitions across engineering, support, security, GTM, and leadership.
- You lead change and not just implement it, you make it stick.
- You can go toe-to-toe with engineers, support leads, and product leaders to clarify the "why" behind technical tradeoffs and incident decisions.
- You act decisively even in high ambiguity. When priorities collide, you clarify, decide, and help the org move forward.
- You communicate with relentless clarity: context and intent early, often, and candidly especially when it's uncomfortable.
- You can work directly with data tools (e.g., Databricks, SQL) to build rich reporting and meaningful insights.
- You understand incident tooling (incident.io or similar) and how it integrates with Slack, PagerDuty, and on-call workflows.
- You work well remotely.
- You need to be authorized to work in the country due to nationality or valid work permit.
Benefits
- Offers Equity
- Offers Bonus
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
incident managementAI toolsautomated incident summarizationintelligent severity classificationroot cause analysisdata analysisreportingworkflow automationSQLSRE practices
Soft Skills
leadershipcommunicationdecision-makingcollaborationproblem-solvingcoachingtrust-buildingadaptabilityclarity in ambiguitychange management