FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Platform Reliability Engineer
Hearst HealthPlatform Reliability Engineer at Homecare Homebase ensuring reliability and performance of critical healthcare services by blending software engineering with system operations.
Tech Stack
Tools & technologiesAnsibleAWSAzureGrafanaGraphQLJavaScriptKubernetesPrometheusPythonServiceNowSplunk
About the role
Key responsibilities & impact- Deliver solutions that enhance the overall reliability of the platform and/or reduce toil.
- Establish modern observability patterns and implement those patterns.
- Monitor the overall platform health as well as manage overall uptime and availability.
- Evangelizes best practices and industry standards
- Plan and implement modern SRE practices
- Developing and aligning SLO/SLI, error budgets, capacity models to fulfill business needs
- Operationalization of services including system testing, instrumentation, monitoring, capacity model development, training, and transition to operation teams.
- Participate in the full project lifecycle from planning, implementation, operational readiness, to decommissioning.
- Manage deployments of major releases.
- Lead and coordinate resolution efforts during major incidents by serving as the incident commander.
- Participate in an equitable 24×7 on-call rotation—serving as first responder for production alerts and escalation point for other teams.
- Understand impact of technical implementation and processes to the business
- Work with business owners to define SLAs in contracts
- Present new designs and plans to Architectural Advisory Board for feedback
- Plan and manage projects of the team
- Act as a technical leader that is a point of escalation, provide mentorship, and technical direction
Requirements
What you’ll need- Bachelor’s degree in Computer Science, Systems Engineering, Math or related (equivalent experience considered) required.
- 3+ years experience in a 24x7 production enterprise-class environment as an SRE or comparable role.
- 3+ years Kubernetes administration/support in a production environment.
- 3+ years Azure or AWS PaaS, IaaS, and resource administration/support in a production environment.
- Excellent problem solving and analytical skills with attention to detail and driving issues to resolution.
- Experience solving problems via automation using orchestration platforms such as Ansible, Azure Automation, and ServiceNow Flows.
- Proficient with scripting languages (multiple preferred): Bash, PowerShell, Python, and JavaScript.
- Proficient with data tier languages: TSQL and GraphQL.
- Proficient with the following monitoring solutions (multiple preferred): Datadog, Splunk, Prometheus/Grafana, Application Insights, Azure Monitor, and Microsoft SCOM.
- Proficient with modern SRE and Observability concepts (eg. OTEL, service level management, etc).
Benefits
Comp & perks- Competitive pay
- Robust benefits
- Professional development opportunities
- Flexibility
- Meaningful work
- Leaders who care
- A company that gives back
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Kubernetes administrationAzure PaaSAWS IaaSAnsibleAzure AutomationBashPowerShellPythonTSQLGraphQL
Soft Skills
problem solvinganalytical skillsattention to detaildriving issues to resolutiontechnical leadershipmentorshipproject management