
Reliability Engineer
ShiftCare
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇦🇺 Australia
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
Distributed Systems
About the role
- Own and improve our CI/CD pipelines (CircleCI), reducing deploy times and failure rates.
- Design and implement observability tooling - from synthetic checks and smoke tests to meaningful alerts and dashboards.
- Build reliable retry and back-off mechanisms for critical user workflows.
- Help architect and implement failover and fallback mechanisms for critical vendors and workflows.
- Work with Support to build debug tooling and dashboards that empower non-engineers.
- Collaborate with engineering to define and template runbooks, kill switches, and disaster mitigation patterns.
- Champion performance tuning and scalability improvements.
- ... and many other things, driven by you!
Requirements
- You thrive on ownership. Identifying problems, proposing solutions, and driving them to completion.
- You’re passionate about reliability, observability, and building robust distributed systems.
- You bring experience working in a modern SaaS environment, have learnt lessons along the way, and are eager to apply that expertise in a new context.
- You have deep knowledge of background job processing, eventing, caching, and distributed systems.
- You have proven experience improving CI/CD pipelines. We currently use CircleCI but don't discard a migration.
- You’re comfortable designing and improving observability stacks (New Relic, Datadog, Honeycomb, etc.).
- You’ve built resilient systems using retries, back-offs, queueing, circuit breakers, graceful degradation, kill switches, isolation of workloads, etc.
- You care deeply about developer ergonomics and fostering a culture of reliability.
- You have a bias toward action. Delivering tools that improve both system behavior and developer happiness.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
CI/CD pipelinesCircleCIobservability toolingsynthetic checkssmoke testsretriesback-off mechanismsdistributed systemsbackground job processingeventing
Soft skills
ownershipproblem-solvingreliabilitycollaborationdeveloper ergonomicsbias toward actiondriving solutions to completionfostering culture of reliability