
Site Reliability Engineer
Compass Education
full-time
Posted on:
Location Type: Hybrid
Location: Hawthorn • Australia
Visit company websiteExplore more
About the role
- **What you'll do:**
- **Infrastructure & Automation**
- - Operate and improve our cloud infrastructure to ensure systems remain stable, scalable and secure as usage grows.
- - Strengthen environment consistency and deployment safety through improved configuration and automation.
- - Reduce operational toil by automating repetitive processes and improving tooling.
- **Observability & Monitoring**
- - Build and refine monitoring, alerting and logging to detect issues early and reduce customer impact.
- - Improve dashboards and production visibility for Engineering squads.
- - Raise the bar for observability before services reach production.
- **Production & Incident Management**
- - Participate in on-call and respond to incidents in a structured, calm manner.
- - Lead lower-complexity incidents end-to-end and support higher-impact events.
- - Contribute to post-incident reviews and implement systemic improvements.
- **Reliability, Resilience & Risk**
- - Contribute to improving service reliability targets and reducing repeat incidents.
- - Support capacity planning, performance optimisation and disaster recovery readiness.
- - Identify operational and security risks and contribute to preventative controls.
Requirements
- **About You **
- You’re a pragmatic, systems-minded engineer who stays calm under pressure and takes ownership of keeping production environments stable, secure and continuously improving.
- You bring:
- - 3-4+ years’ experience in Site Reliability, Platform Engineering, DevOps or similar roles, with a strong focus on production systems and operational excellence.
- - Experience supporting live production environments, including participation in on-call rotations and incident response. You understand what it means to own systems that customers rely on daily.
- - Confidence debugging and resolving issues under pressure, using structured problem-solving to diagnose root causes and restore service quickly.
- - Experience working with cloud infrastructure (e.g. AWS or similar), including managing environments that support scalable, customer-facing applications.
- - Familiarity with containerised environments and orchestration tools, and how they impact deployment, scaling and service reliability.
- - Experience contributing to infrastructure management and automation, helping create consistent, repeatable environments.
- - Familiarity with monitoring and alerting platforms, and an understanding of how strong observability improves reliability outcomes.
- - Scripting or automation capability, with the ability to reduce manual processes and improve operational efficiency.
Benefits
- **What’s in it for you?**
- You’ll join a purpose-driven company at a genuinely exciting stage of growth, with the opportunity to make a real impact on education at scale.
- What we offer:
- - A hybrid working environment, with teams spending three days a week in our Melbourne office.
- - Learning and development opportunities, including a dedicated PD budget.
- - 24/7 access to our Employee Assistance Program (EAP), including face-to-face, phone and live chat support.
- - A parental leave program for both primary and secondary carers.
- - A supportive, inclusive culture where your voice is valued and heard.
- - The chance to grow alongside a fast-moving, ambitious organisation.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringPlatform EngineeringDevOpscloud infrastructureAWScontainer orchestrationinfrastructure managementautomationscriptingmonitoring and alerting
Soft Skills
calm under pressureownershipstructured problem-solvingincident responseoperational excellencepragmaticsystems-minded