FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
VanguardSenior Site Reliability Champion evaluating and implementing resilience standards for Vanguard's platforms. Leading post-incident reviews and collaborating with product teams on reliability risks.
Posted 6/23/2026full-timeWayne • North Carolina, Pennsylvania, Texas • 🇺🇸 United StatesSeniorWebsite
Tech Stack
Tools & technologiesPythonRPASplunk
About the role
Key responsibilities & impact- Evaluate applications, platforms, and vendors to assess resiliency, reliability, and operational risk.
- Design and implement processes that enforce enterprise resiliency and reliability standards.
- Lead blameless post‑incident reviews for high‑severity incidents or incidents spanning multiple complex product families.
- Partner with product and platform teams to proactively identify and remediate reliability risks before they impact clients.
- Develop, communicate, and evangelize new standards, tools, and frameworks across subdivisions, ensuring consistent adoption.
- Troubleshoot complex production issues and implement durable solutions that prevent recurrence.
- Participate in a periodic on‑call rotation to support production stability.
- Evaluate and onboard resiliency and reliability tooling.
- Actively participate in reliability engineering and resilience communities of practice, contributing to shared learning and enterprise consistency.
- Contribute to strategic initiatives that advance Vanguard’s operational maturity and resiliency posture.
Requirements
What you’ll need- Experience with modern observability and monitoring tools, such as Splunk, Honeycomb, CloudWatch, Dynatrace, or AppDynamics.
- Strong understanding of SLIs, SLOs, and SLAs, including dashboarding and reporting practices.
- Experience with alert design, anomaly detection, predictive alerting, and synthetic monitoring using structured methodologies.
- Experience with automation and resilience practices such as Python-based automation, RPA platforms (e.g., Blue Prism, UiPath), chaos engineering, and failure analysis techniques (e.g., FMEA).
Benefits
Comp & perks- health insurance
- retirement plans
- paid time off
- flexible work arrangements
- professional development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonRPAchaos engineeringfailure analysisSLIsSLOsSLAsalert designanomaly detectionsynthetic monitoring
Soft Skills
leadershipcommunicationcollaborationproblem-solvingstrategic thinking