Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Vanguard

Senior Site Reliability Engineer

Vanguard

Senior Site Reliability Champion evaluating and implementing resilience standards for Vanguard's platforms. Leading post-incident reviews and collaborating with product teams on reliability risks.

Posted 6/23/2026full-timeWayne • North Carolina, Pennsylvania, Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
PythonRPASplunk

About the role

Key responsibilities & impact
  • Evaluate applications, platforms, and vendors to assess resiliency, reliability, and operational risk.
  • Design and implement processes that enforce enterprise resiliency and reliability standards.
  • Lead blameless post‑incident reviews for high‑severity incidents or incidents spanning multiple complex product families.
  • Partner with product and platform teams to proactively identify and remediate reliability risks before they impact clients.
  • Develop, communicate, and evangelize new standards, tools, and frameworks across subdivisions, ensuring consistent adoption.
  • Troubleshoot complex production issues and implement durable solutions that prevent recurrence.
  • Participate in a periodic on‑call rotation to support production stability.
  • Evaluate and onboard resiliency and reliability tooling.
  • Actively participate in reliability engineering and resilience communities of practice, contributing to shared learning and enterprise consistency.
  • Contribute to strategic initiatives that advance Vanguard’s operational maturity and resiliency posture.

Requirements

What you’ll need
  • Experience with modern observability and monitoring tools, such as Splunk, Honeycomb, CloudWatch, Dynatrace, or AppDynamics.
  • Strong understanding of SLIs, SLOs, and SLAs, including dashboarding and reporting practices.
  • Experience with alert design, anomaly detection, predictive alerting, and synthetic monitoring using structured methodologies.
  • Experience with automation and resilience practices such as Python-based automation, RPA platforms (e.g., Blue Prism, UiPath), chaos engineering, and failure analysis techniques (e.g., FMEA).

Benefits

Comp & perks
  • health insurance
  • retirement plans
  • paid time off
  • flexible work arrangements
  • professional development

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonRPAchaos engineeringfailure analysisSLIsSLOsSLAsalert designanomaly detectionsynthetic monitoring
Soft Skills
leadershipcommunicationcollaborationproblem-solvingstrategic thinking