Senior Platform Reliability Engineer

Hearst Health

Platform Reliability Engineer at Homecare Homebase ensuring reliability and performance of critical healthcare services by blending software engineering with system operations.

Posted 5/6/2026full-timeDallas • Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies

AnsibleAWSAzureGrafanaGraphQLJavaScriptKubernetesPrometheusPythonServiceNowSplunk

About the role

Key responsibilities & impact

Deliver solutions that enhance the overall reliability of the platform and/or reduce toil.
Establish modern observability patterns and implement those patterns.
Monitor the overall platform health as well as manage overall uptime and availability.
Evangelizes best practices and industry standards
Plan and implement modern SRE practices
Developing and aligning SLO/SLI, error budgets, capacity models to fulfill business needs
Operationalization of services including system testing, instrumentation, monitoring, capacity model development, training, and transition to operation teams.
Participate in the full project lifecycle from planning, implementation, operational readiness, to decommissioning.
Manage deployments of major releases.
Lead and coordinate resolution efforts during major incidents by serving as the incident commander.
Participate in an equitable 24×7 on-call rotation—serving as first responder for production alerts and escalation point for other teams.
Understand impact of technical implementation and processes to the business
Work with business owners to define SLAs in contracts
Present new designs and plans to Architectural Advisory Board for feedback
Plan and manage projects of the team
Act as a technical leader that is a point of escalation, provide mentorship, and technical direction

Requirements

What you’ll need

Bachelor’s degree in Computer Science, Systems Engineering, Math or related (equivalent experience considered) required.
3+ years experience in a 24x7 production enterprise-class environment as an SRE or comparable role.
3+ years Kubernetes administration/support in a production environment.
3+ years Azure or AWS PaaS, IaaS, and resource administration/support in a production environment.
Excellent problem solving and analytical skills with attention to detail and driving issues to resolution.
Experience solving problems via automation using orchestration platforms such as Ansible, Azure Automation, and ServiceNow Flows.
Proficient with scripting languages (multiple preferred): Bash, PowerShell, Python, and JavaScript.
Proficient with data tier languages: TSQL and GraphQL.
Proficient with the following monitoring solutions (multiple preferred): Datadog, Splunk, Prometheus/Grafana, Application Insights, Azure Monitor, and Microsoft SCOM.
Proficient with modern SRE and Observability concepts (eg. OTEL, service level management, etc).

Benefits

Comp & perks

Competitive pay
Robust benefits
Professional development opportunities
Flexibility
Meaningful work
Leaders who care
A company that gives back

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Kubernetes administrationAzure PaaSAWS IaaSAnsibleAzure AutomationBashPowerShellPythonTSQLGraphQL

Soft Skills

problem solvinganalytical skillsattention to detaildriving issues to resolutiontechnical leadershipmentorshipproject management