Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Employer Direct Healthcare

Senior Site Reliability Engineer

Employer Direct Healthcare

Senior Site Reliability Engineer managing Azure-based healthcare platform for Lantern. Defining SRE practices and ensuring system reliability and compliance.

Posted 4/23/2026full-timeDallas • Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AWSAzureGoogle Cloud PlatformGrafanaPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • Define and track SLOs/SLIs/error budgets for critical healthcare services
  • Build and maintain observability platforms (monitoring, logging, alerting, tracing) using Datadog and Azure Monitor
  • Lead incident management processes using Rootly, including on-call rotations, runbooks, and post-incident reviews
  • Automate operational toil through Infrastructure-as-Code (Terraform) and custom tooling
  • Design and implement disaster recovery and business continuity strategies
  • Collaborate with development teams to improve service reliability through architecture reviews and chaos engineering
  • Optimize system performance, capacity planning, and cost efficiency for Azure infrastructure
  • Ensure production systems meet HIPAA, SOC 2, and other regulatory requirements
  • Maintain and improve CI/CD pipelines to support safe, rapid deployments
  • Mentor junior engineers and foster a culture of reliability and operational excellence

Requirements

What you’ll need
  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience.
  • 4+ years in SRE, DevOps, or production operations roles
  • 3+ years with Microsoft Azure (AWS/GCP a plus)
  • Strong experience with observability tools (Datadog, Azure Monitor, Prometheus, Grafana, or similar)
  • Experience defining and managing SLOs/SLIs and error budgets
  • Proven incident management and on-call experience (Rootly or similar incident management platforms)
  • Hands-on with Infrastructure as Code (Terraform) and CI/CD (Azure DevOps, GitHub Actions)
  • Experience in regulated environments (healthcare/HIPAA preferred)
  • Strong scripting skills (Python, Bash, PowerShell)
  • Excellent communication and collaboration skills
  • If you don’t meet every requirement listed, we still encourage you to apply.

Benefits

Comp & perks
  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • Short & Long Term Disability
  • Life Insurance
  • 401k with company match
  • Flexible Time Off
  • Paid Parental Leave

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREDevOpsMicrosoft AzureTerraformCI/CDPythonBashPowerShellobservabilitychaos engineering
Soft Skills
communicationcollaborationmentoringincident managementoperational excellence