Net Health

Site Reliability Engineer III

Net Health

full-time

Posted on:

Location Type: Remote

Location: PennsylvaniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $108,720 - $135,900 per year

About the role

  • Collaboratively manage the performance, stability, and redundancy of all Platform systems and infrastructure.
  • Lead emergency response efforts in conjunction with Engineering, Infrastructure, and Database teams to establish root cause.
  • Lead the efforts to build robust monitoring solutions while expanding current monitoring and alerting footprint.
  • Conduct Blameless Postmortems and Anomaly Investigations after incidents to further analyze root cause and create permanent solutions to improve serviceability and prevent future outages.
  • Establish a Don’t Repeat Incidents (DRI) culture by learning from past issues and always looking to improve monitoring and dashboarding capabilities.
  • Ensure applications are performing efficiently, collaborating with development teams and architecture to resolve application performance issues.
  • Consults with management in analyzing short- and long-range business requirements and recommends innovations.
  • Champion automation efforts to reduce or eliminate repetitive, manual processes.
  • Partner with project management to define Service Level Objectives (SLO) and identify and implement Service Level Indicators (SLI) to track compliance.
  • Champion capacity management and disaster recovery testing efforts.

Requirements

  • Bachelor’s degree in computer science OR equivalent
  • 6+ years’ progressive experience in IT Operations and/or systems management
  • 6+ years direct experience in a technical role dealing with complex enterprise software landscapes (DevOps focused development)
  • 6+ years’ experience with scripting and automating technical activities
  • Experience with best-in-class application monitoring (APM) tooling (New Relic, Dynatrace, AppDynamics)
  • Direct, hands-on experience with automated software and system management.
  • Strong knowledge of change control best practices and methodologies
  • Experience with Ansible, Terraform, Python, or Docker (or similar) is a plus
  • Experience with Agile development methodology and/or ITIL ITSM is a plus
Benefits
  • Unlimited PTO
  • Comprehensive Benefits Package
  • Employee Resource Groups
  • Casual Dress Code
  • Prioritized Employee Wellness
  • Diversity And Inclusion
  • A Voice
  • New Hire Support
  • Career Development
  • Educational Assistance
  • Employee Referral Bonus
  • Progressive Parental Leave
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
scriptingautomationapplication monitoringDevOpsAnsibleTerraformPythonDockerAgile developmentITIL ITSM
Soft Skills
collaborationleadershipproblem-solvinginnovationcapacity managementdisaster recoverycommunicationanalytical thinkingroot cause analysiscontinuous improvement
Certifications
Bachelor’s degree in computer science