
Lead Site Reliability Engineer
Gifthealth
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $123,000 - $154,000 per year
Job Level
About the role
- Designs, builds, and maintains reliable, scalable software systems supporting Ruby on Rails applications
- Embeds reliability, performance, and operational best practices into application code and development workflows
- Owns DevOps practices including CI/CD reliability, deployment strategies, and release safety
- Leads incident response, debugging, and root cause analysis across application and platform layers
- Implements and evolves observability (logging, metrics, tracing) within application and service code
- Partners with engineering teams on architecture, capacity planning, and technical standards
Requirements
- Bachelor’s degree in computer science, engineering, or related field OR equivalent professional experience in software engineering, SRE, or DevOps roles (Required)
- Cloud platform certifications (AWS, GCP, Azure) (Preferred)
- SRE or DevOps-focused certifications (Preferred)
- 5+ years of experience in software engineering, SRE, or DevOps roles (Required)
- Hands-on experience building and operating Ruby on Rails applications in production (Required)
- Experience in owning production incidents and application-level reliability (Required)
- Experience in high-growth or scaling engineering organizations (Preferred)
- Experience working in regulated or customer-impact–sensitive environments (Preferred)
- Knowledge of Ruby on Rails application architecture and production operations; software reliability engineering principles (SLOs, SLIs, error budgets); and modern DevOps and CI/CD practices (Required)
- Knowledge of security and compliance considerations in production systems (Preferred)
- Strong software engineering skills (Ruby and/or comparable backend languages) (Required)
- Debugging and performance optimization of production applications skills (Required)
- CI/CD pipelines, deployment automation, and release tooling skills (Required)
- Monitoring and observability tooling (Datadog, New Relic, Prometheus, etc.) skills (Required)
- Infrastructure as Code (Terraform or similar) skills (Preferred)
- Containerization and orchestration (Docker) skills (Preferred)
- Ability to write production-quality code that improves system reliability (Required)
- Ability to collaborate with product and engineering teams to influence design decisions (Required)
- Ability to troubleshoot complex, cross-system failures (Required)
- Ability to mentor engineers on operational ownership and reliability practices (Preferred)
- Ability to balance speed of delivery with long-term system health (Preferred)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Ruby on RailsCI/CDDevOpssoftware reliability engineeringdebuggingperformance optimizationmonitoringobservabilityInfrastructure as Codecontainerization
Soft Skills
collaborationmentoringtroubleshootinginfluencing design decisionsbalancing speed of delivery with system health
Certifications
cloud platform certificationsSRE certificationsDevOps certifications