FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AVP, Recovery Manager
LPL Financial. Lead and coordinate cross‑functional technical teams during major and critical incidents, ensuring timely recovery and effective stakeholder engagement.
Posted 5/15/2026full-timeFort Mill • South Carolina, Texas • 🇺🇸 United StatesLead💰 $112,476 - $187,460 per yearWebsite
Tech Stack
Tools & technologiesCloud
About the role
Key responsibilities & impact- Lead and coordinate cross‑functional technical teams during major and critical incidents, ensuring timely recovery and effective stakeholder engagement.
- Serve as a recovery lead during declared major incidents, maintaining focus on service restoration and customer impact.
- Participate in and facilitate post‑incident reviews and post‑mortems, ensuring outcomes are actionable and measurable.
- Drive high‑quality root cause analysis for major incidents using structured techniques such as 5‑Why, Fishbone, and Blameless RCA.
- Ensure contributing factors (process, technology, observability, automation, or human factors) are clearly identified and documented.
- Partner with domain teams to translate findings into concrete remediation actions.
- Develop, document, and maintain incident recovery plans, SOPs, runbooks, and playbooks in collaboration with domain owners.
- Support and execute mock drills, recovery tests, and readiness exercises to improve response effectiveness.
- Ensure recovery documentation remains accurate, consumable, and operationally relevant.
- Work with application, infrastructure, and platform teams to improve diagnostic accuracy and time‑to‑engage during incidents.
- Help establish clear ownership, escalation paths, and recovery patterns to reduce dependency on ad‑hoc tribal knowledge.
- Promote repeatable recovery patterns across services.
- Identify opportunities to improve service reliability, operational maturity, and recovery effectiveness.
- Analyze incident data and trends to recommend targeted improvements across people, process, and technology.
- Support adoption of SRE‑aligned practices, including error budgets, readiness reviews, and failure mode awareness.
- Provide structured feedback to Observability, Automation, Resiliency, and Domain teams on; gaps in monitoring, alerts, and diagnostics; single points of failure; architectural or design weaknesses impacting recoverability
- Act as an operational voice to ensure post‑incident learnings inform engineering and platform decisions.
- Mentor junior recovery managers or operational staff through hands‑on incident participation and coaching.
- Contribute to operational training sessions, tabletop exercises, and knowledge‑sharing initiatives.
- Maintain awareness of industry best practices in production operations, incident management, and SRE.
Requirements
What you’ll need- 5+ years of experience in Production Services, Incident Management, Recovery Management, Problem Management, SRE, DevOps, or related disciplines
- 2+ years of application, infrastructure, and/or cloud technologies, enabling effective triage and informed recovery leadership
- 2+ years experience using observability tools, logs, metrics, and diagnostics to troubleshoot production issues
Benefits
Comp & perks- 401K matching
- health benefits
- employee stock options
- paid time off
- volunteer time off
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
incident managementrecovery managementproblem managementsite reliability engineering (SRE)DevOpsroot cause analysisobservabilitydiagnosticscloud technologiesproduction services
Soft Skills
leadershipstakeholder engagementcommunicationmentoringcollaborationanalytical thinkingproblem-solvingorganizational skillscoachingtraining