SafeRide Health

Site Reliability Engineer

SafeRide Health

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Keeping systems and services running smoothly with minimal downtime by focusing on availability, reliability, and scalability.
  • Developing and maintaining tools and scripts to automate repetitive tasks such as deployments, configuration management, and monitoring.
  • Implementing and managing monitoring and alerting systems to provide visibility into system performance and quickly detect potential issues.
  • Responding to, diagnosing, and resolving system incidents, including conducting post-mortems to prevent future occurrences.
  • Monitoring system resource usage to forecast future needs and scale systems accordingly to handle increasing user load.
  • Collaborating with stakeholders to identify operational risks and implementing strategies to reduce their likelihood and impact.
  • Analyzing metrics from operating systems and applications to identify areas for performance improvement.

Requirements

  • Minimum of 5 years progressive experience in an IT, Software Engineering, Technology Operations, or Business Continuity role.
  • Minimum of 2 years of hands-on experience in a Site Reliability, DevOps, or IT Observability role.
  • Demonstrated proficiency with production monitoring and alerting tools (DataDog is a major plus!).
  • Basic proficiency in an AWS containerized environment running infrastructure as code.
Benefits
  • Competitive compensation and performance-based bonus potential
  • Full medical, dental, and vision coverage
  • Generous PTO and paid company holidays
  • 401(k) with employer match
  • Paid parental leave and family support benefits
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
automationconfiguration managementmonitoringincident responseperformance analysisinfrastructure as codesite reliabilityDevOpsIT observabilityproduction monitoring
Soft Skills
collaborationproblem-solvingrisk managementcommunicationanalytical thinking