Penn Mutual

Senior Site Reliability Engineer

Penn Mutual

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $128,000 - $165,000 per year

Job Level

About the role

  • Lead reliability availability, scalability, and recovery design for critical systems.
  • Define and evolve SLOs, SLIs, and error budget practices across services.
  • Identify systemic reliability risks and drive cross-team remediation efforts.
  • Influence application and platform architecture to improve operational outcomes.
  • Act as a technical lead during major incidents and complex outages.
  • Drive high-quality root cause analysis and recommend corrective actions.
  • Improve incident response processes, tooling, and runbooks.
  • Design and implement advanced automation to eliminate operational toil at scale.
  • Build and maintain shared SRE tooling and platforms.
  • Set engineering standards for reliability-focused code and operational practices.
  • Review and improve CI/CD, deployment, and rollback strategies.
  • Partner with Release and Change Management to automate release practices.
  • Lead risk assessments for high impact changes and releases.
  • Ensure compliance requirements are met without sacrificing engineering velocity.
  • Serve as a reliability authority for release readiness decisions.
  • Mentor junior SREs and junior engineers through technical guidance and review.
  • Lead by example in operational excellence and engineering rigor.
  • Influence reliability culture across engineering and product teams.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • 6–10+ years of experience in SRE, software engineering, platform, or DevOps roles.
  • Professional experience in performing root cause analysis on incidents, documenting SRE systems and usage.
  • Strong programming skills with professional experience in multiple languages.
  • Deep experience with AWS and distributed systems.
  • Advanced knowledge of observability, ITSM, and reliability engineering principles.
  • Proven ability to operate effectively in complex, regulated environments.
  • Experience with use/implementation of observability tools (metrics, logs, tracing).
  • Experience with CI/CD pipelines and deployment automation.
  • Experience with Root Cause Analysis investigation/documentation.
  • Familiarity with containerization and orchestration technologies.
  • Strong troubleshooting and analytical skills.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREroot cause analysisprogramming languagesAWSobservabilityCI/CDdeployment automationcontainerizationorchestration technologiesreliability engineering
Soft Skills
mentoringleadershipanalytical skillstroubleshootinginfluencecommunicationoperational excellenceengineering rigorcollaborationtechnical guidance