FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
About the role
Key responsibilities & impact- Leads an enterprise-wide Reactive Problem Management function covering the full application portfolio; accountable for consistent execution, quality, and outcomes of post-incident problem investigations.
- Ensures problems reach true root cause using structured analysis and evidence-based conclusions; sets quality standards for problem records, timelines, and closure criteria.
- Creates, assigns, and rigorously tracks corrective and preventive Action Items, driving cross-team accountability through completion and validating effectiveness in reducing recurrence and improving resiliency.
- Partners with application, platform, infrastructure, network, security, and operations teams after service restoration to coordinate investigations, remove blockers, and align on remediation plans.
- Drives measurable improvement in resiliency, reduction in customer disruption, and reduced MTTR through elimination of repeat incidents, improved detection/diagnostics, and better operational readiness.
- Uses impact, recurrence, and risk to influence engineering and platform backlogs, ensuring the highest-value remediation work is prioritized and delivered without direct change/governance ownership.
- Produces clear, executive-ready summaries of root cause, contributing factors, risk exposure, remediation progress, and expected impact; escalates when commitments or timelines are at risk.
- Manages and develops a team of 7–10 Problem Managers/analysts; coaches structured problem-solving, stakeholder management, and crisp documentation; sets expectations for pace, rigor, and accountability.
- Establishes and maintains standard operating practices for intake, severity/priority, aging management, escalation, and closure; ensures consistency across business units and application teams.
- Identifies systemic patterns across incidents and problems, recommends enterprise-level resiliency improvements, and drives preventative initiatives based on trend and impact analysis.
Requirements
What you’ll need- 5+ years of related experience.
- Bachelor’s degree (BS/BA) in Computer Science preferred.
- Supervisor: Yes
Benefits
Comp & perks- Medical/Dental/Vision coverage
- 401(k) plan
- Tuition reimbursement program
- Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
- Paid Parental Leave
- Paid Caregiver Leave
- Additional sick leave beyond what state and local law require may be available but is unprotected
- Adoption Reimbursement
- Disability Benefits (short term and long term)
- Life and Accidental Death Insurance
- Supplemental benefit programs: critical illness/accident hospital indemnity/group legal
- Employee Assistance Programs (EAP)
- Extensive employee wellness programs
- Employee discounts up to 50% off on eligible AT&T mobility plans and accessories, AT&T internet (and fiber where available) and AT&T phone
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
problem managementroot cause analysisstructured analysiscorrective actionpreventive actionrisk analysisincident managementoperational readinesstrend analysismeasurable improvement
Soft Skills
leadershipcoachingstakeholder managementcommunicationaccountabilityteam managementdocumentationcross-team collaborationquality assuranceexecutive reporting
