
Senior Site Reliability Engineer
Penn Mutual
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $128,000 - $165,000 per year
Job Level
Tech Stack
About the role
- Lead reliability availability, scalability, and recovery design for critical systems.
- Define and evolve SLOs, SLIs, and error budget practices across services.
- Identify systemic reliability risks and drive cross-team remediation efforts.
- Influence application and platform architecture to improve operational outcomes.
- Act as a technical lead during major incidents and complex outages.
- Drive high-quality root cause analysis and recommend corrective actions.
- Improve incident response processes, tooling, and runbooks.
- Design and implement advanced automation to eliminate operational toil at scale.
- Build and maintain shared SRE tooling and platforms.
- Set engineering standards for reliability-focused code and operational practices.
- Review and improve CI/CD, deployment, and rollback strategies.
- Partner with Release and Change Management to automate release practices.
- Lead risk assessments for high impact changes and releases.
- Ensure compliance requirements are met without sacrificing engineering velocity.
- Serve as a reliability authority for release readiness decisions.
- Mentor junior SREs and junior engineers through technical guidance and review.
- Lead by example in operational excellence and engineering rigor.
- Influence reliability culture across engineering and product teams.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field.
- 6–10+ years of experience in SRE, software engineering, platform, or DevOps roles.
- Professional experience in performing root cause analysis on incidents, documenting SRE systems and usage.
- Strong programming skills with professional experience in multiple languages.
- Deep experience with AWS and distributed systems.
- Advanced knowledge of observability, ITSM, and reliability engineering principles.
- Proven ability to operate effectively in complex, regulated environments.
- Experience with use/implementation of observability tools (metrics, logs, tracing).
- Experience with CI/CD pipelines and deployment automation.
- Experience with Root Cause Analysis investigation/documentation.
- Familiarity with containerization and orchestration technologies.
- Strong troubleshooting and analytical skills.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREroot cause analysisprogramming languagesAWSobservabilityCI/CDdeployment automationcontainerizationorchestration technologiesreliability engineering
Soft Skills
mentoringleadershipanalytical skillstroubleshootinginfluencecommunicationoperational excellenceengineering rigorcollaborationtechnical guidance