
Site Reliability Engineer III
Net Health
full-time
Posted on:
Location Type: Remote
Location: Pennsylvania • United States
Visit company websiteExplore more
Salary
💰 $108,720 - $135,900 per year
About the role
- Collaboratively manage the performance, stability, and redundancy of all Platform systems and infrastructure.
- Lead emergency response efforts in conjunction with Engineering, Infrastructure, and Database teams to establish root cause.
- Lead the efforts to build robust monitoring solutions while expanding current monitoring and alerting footprint.
- Conduct Blameless Postmortems and Anomaly Investigations after incidents to further analyze root cause and create permanent solutions to improve serviceability and prevent future outages.
- Establish a Don’t Repeat Incidents (DRI) culture by learning from past issues and always looking to improve monitoring and dashboarding capabilities.
- Ensure applications are performing efficiently, collaborating with development teams and architecture to resolve application performance issues.
- Consults with management in analyzing short- and long-range business requirements and recommends innovations.
- Champion automation efforts to reduce or eliminate repetitive, manual processes.
- Partner with project management to define Service Level Objectives (SLO) and identify and implement Service Level Indicators (SLI) to track compliance.
- Champion capacity management and disaster recovery testing efforts.
Requirements
- Bachelor’s degree in computer science OR equivalent
- 6+ years’ progressive experience in IT Operations and/or systems management
- 6+ years direct experience in a technical role dealing with complex enterprise software landscapes (DevOps focused development)
- 6+ years’ experience with scripting and automating technical activities
- Experience with best-in-class application monitoring (APM) tooling (New Relic, Dynatrace, AppDynamics)
- Direct, hands-on experience with automated software and system management.
- Strong knowledge of change control best practices and methodologies
- Experience with Ansible, Terraform, Python, or Docker (or similar) is a plus
- Experience with Agile development methodology and/or ITIL ITSM is a plus
Benefits
- Unlimited PTO
- Comprehensive Benefits Package
- Employee Resource Groups
- Casual Dress Code
- Prioritized Employee Wellness
- Diversity And Inclusion
- A Voice
- New Hire Support
- Career Development
- Educational Assistance
- Employee Referral Bonus
- Progressive Parental Leave
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
scriptingautomationapplication monitoringDevOpsAnsibleTerraformPythonDockerAgile developmentITIL ITSM
Soft Skills
collaborationleadershipproblem-solvinginnovationcapacity managementdisaster recoverycommunicationanalytical thinkingroot cause analysiscontinuous improvement
Certifications
Bachelor’s degree in computer science