Gifthealth

Lead Site Reliability Engineer

Gifthealth

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $123,000 - $154,000 per year

Job Level

About the role

  • Designs, builds, and maintains reliable, scalable software systems supporting Ruby on Rails applications
  • Embeds reliability, performance, and operational best practices into application code and development workflows
  • Owns DevOps practices including CI/CD reliability, deployment strategies, and release safety
  • Leads incident response, debugging, and root cause analysis across application and platform layers
  • Implements and evolves observability (logging, metrics, tracing) within application and service code
  • Partners with engineering teams on architecture, capacity planning, and technical standards

Requirements

  • Bachelor’s degree in computer science, engineering, or related field OR equivalent professional experience in software engineering, SRE, or DevOps roles (Required)
  • Cloud platform certifications (AWS, GCP, Azure) (Preferred)
  • SRE or DevOps-focused certifications (Preferred)
  • 5+ years of experience in software engineering, SRE, or DevOps roles (Required)
  • Hands-on experience building and operating Ruby on Rails applications in production (Required)
  • Experience in owning production incidents and application-level reliability (Required)
  • Experience in high-growth or scaling engineering organizations (Preferred)
  • Experience working in regulated or customer-impact–sensitive environments (Preferred)
  • Knowledge of Ruby on Rails application architecture and production operations; software reliability engineering principles (SLOs, SLIs, error budgets); and modern DevOps and CI/CD practices (Required)
  • Knowledge of security and compliance considerations in production systems (Preferred)
  • Strong software engineering skills (Ruby and/or comparable backend languages) (Required)
  • Debugging and performance optimization of production applications skills (Required)
  • CI/CD pipelines, deployment automation, and release tooling skills (Required)
  • Monitoring and observability tooling (Datadog, New Relic, Prometheus, etc.) skills (Required)
  • Infrastructure as Code (Terraform or similar) skills (Preferred)
  • Containerization and orchestration (Docker) skills (Preferred)
  • Ability to write production-quality code that improves system reliability (Required)
  • Ability to collaborate with product and engineering teams to influence design decisions (Required)
  • Ability to troubleshoot complex, cross-system failures (Required)
  • Ability to mentor engineers on operational ownership and reliability practices (Preferred)
  • Ability to balance speed of delivery with long-term system health (Preferred)
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Ruby on RailsCI/CDDevOpssoftware reliability engineeringdebuggingperformance optimizationmonitoringobservabilityInfrastructure as Codecontainerization
Soft Skills
collaborationmentoringtroubleshootinginfluencing design decisionsbalancing speed of delivery with system health
Certifications
cloud platform certificationsSRE certificationsDevOps certifications