SafeRide Health

Manager, Site Reliability Engineer

SafeRide Health

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Job Level

Tech Stack

About the role

  • Keeping systems and services running smoothly with minimal downtime by focusing on availability, reliability, and scalability.
  • Developing and maintaining tools and scripts to automate repetitive tasks such as deployments, configuration management, and monitoring.
  • Implementing and managing monitoring and alerting systems to provide visibility into system performance and quickly detect potential issues.
  • Responding to, diagnosing, and resolving system incidents, including conducting post-mortems to prevent future occurrences.
  • Monitoring system resource usage to forecast future needs and scale systems accordingly to handle increasing user load.
  • Collaborating with stakeholders to identify operational risks and implementing strategies to reduce their likelihood and impact.
  • Analyzing metrics from operating systems and applications to identify areas for performance improvement.
  • Provide direction to a team of direct reports and matrixed resources in alignment with Site Reliability objectives.
  • Manage performance of SRE team members through regular 1:1s, coaching sessions, performance reviews, and performance management when necessary.

Requirements

  • Minimum of 8 years progressive experience in an IT, Software Engineering, Technology Operations, or Business Continuity role.
  • Minimum of 3 years of hands-on experience in a Site Reliability, DevOps, or IT Observability role.
  • Minimum of 2 years direct supervisory experience leading technology professionals.
  • Demonstrated proficiency with production monitoring and alerting tools (DataDog is a major plus!).
  • Basic proficiency in an AWS containerized environment running infrastructure as code.
Benefits
  • Competitive compensation and performance-based bonus potential
  • Full medical, dental, and vision coverage
  • Generous PTO and paid company holidays
  • 401(k) with employer match
  • Paid parental leave and family support benefits
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringDevOpsIT Observabilityproduction monitoringalerting systemsinfrastructure as codeautomationconfiguration managementperformance improvementincident response
Soft Skills
leadershipcollaborationcoachingperformance managementstakeholder engagementproblem-solvinganalytical thinkingcommunicationrisk managementteam direction