
Senior Manager, Site Reliability Engineering
PayPal
full-time
Posted on:
Location Type: Hybrid
Location: Scottsdale • Arizona • California • United States
Visit company websiteExplore more
Salary
💰 $152,500 - $226,600 per year
Job Level
Tech Stack
About the role
- Manage and mentor a team of site reliability engineers, setting performance objectives, providing technical guidance, and ensuring alignment with business goals.
- Oversee the execution of reliability initiatives, ensuring critical systems maintain high availability, resilience, and performance at scale.
- Work with engineering, operations, and product teams to ensure seamless integration of reliability best practices into the development, deployment, and operational processes.
- Lead incident management activities, including coordination of response efforts, root cause analysis, and implementing solutions to prevent future incidents.
- Define and track key performance indicators (KPIs) related to system reliability, availability, and performance, reporting results to leadership regularly.
- Promote and drive automation within the site reliability engineering team, ensuring processes are streamlined and systems operate with minimal manual intervention.
- Manage capacity planning efforts, ensuring the scalability of systems and the ability to handle increasing traffic and resource demands effectively.
- Ensure the development and testing of disaster recovery plans and procedures, minimizing downtime in the event of a failure.
- Lead career development and mentorship efforts for team members, ensuring engineers have the tools and opportunities to grow their skills and advance their careers.
Requirements
- 8+ years relevant experience and a Bachelor’s degree OR Any equivalent combination of education and experience.
- Experience leading others
- Bachelor’s degree in computer science, Information Technology, or related field; Master's preferred.
- 8+ years of experience in infrastructure management, with at least 3 years in a leadership role.
- Extensive experience with multiple cloud platforms (AWS, Azure, GCP) and on-premises infrastructure management.
- Demonstrated experience building or scaling AI/ML-based automation for operations; including AIOps platforms, alert noise reduction, auto-remediation, and intelligent runbooks.
- Strong background in incident management, ITIL frameworks, and operational best practices.
- Experience with monitoring tools, automation platforms, and infrastructure-as-code technologies.
Benefits
- generous paid time off
- healthcare coverage for you and your family
- resources to create financial security and support your mental health
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
site reliability engineeringinfrastructure managementcloud platformsAI/ML-based automationincident managementITIL frameworksmonitoring toolsautomation platformsinfrastructure-as-codedisaster recovery
Soft Skills
team managementmentorshiptechnical guidanceperformance objectivescommunicationleadershipcapacity planningproblem-solvingcollaborationcareer development
Certifications
Bachelor’s degree in computer scienceBachelor’s degree in Information TechnologyMaster's degree (preferred)