Compass Education

Site Reliability Engineer

Compass Education

full-time

Posted on:

Location Type: Hybrid

Location: HawthornAustralia

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • **What you'll do:**
  • **Infrastructure & Automation**
  • - Operate and improve our cloud infrastructure to ensure systems remain stable, scalable and secure as usage grows.
  • - Strengthen environment consistency and deployment safety through improved configuration and automation.
  • - Reduce operational toil by automating repetitive processes and improving tooling.
  • **Observability & Monitoring**
  • - Build and refine monitoring, alerting and logging to detect issues early and reduce customer impact.
  • - Improve dashboards and production visibility for Engineering squads.
  • - Raise the bar for observability before services reach production.
  • **Production & Incident Management**
  • - Participate in on-call and respond to incidents in a structured, calm manner.
  • - Lead lower-complexity incidents end-to-end and support higher-impact events.
  • - Contribute to post-incident reviews and implement systemic improvements.
  • **Reliability, Resilience & Risk**
  • - Contribute to improving service reliability targets and reducing repeat incidents.
  • - Support capacity planning, performance optimisation and disaster recovery readiness.
  • - Identify operational and security risks and contribute to preventative controls.

Requirements

  • **About You **
  • You’re a pragmatic, systems-minded engineer who stays calm under pressure and takes ownership of keeping production environments stable, secure and continuously improving.
  • You bring:
  • - 3-4+ years’ experience in Site Reliability, Platform Engineering, DevOps or similar roles, with a strong focus on production systems and operational excellence.
  • - Experience supporting live production environments, including participation in on-call rotations and incident response. You understand what it means to own systems that customers rely on daily.
  • - Confidence debugging and resolving issues under pressure, using structured problem-solving to diagnose root causes and restore service quickly.
  • - Experience working with cloud infrastructure (e.g. AWS or similar), including managing environments that support scalable, customer-facing applications.
  • - Familiarity with containerised environments and orchestration tools, and how they impact deployment, scaling and service reliability.
  • - Experience contributing to infrastructure management and automation, helping create consistent, repeatable environments.
  • - Familiarity with monitoring and alerting platforms, and an understanding of how strong observability improves reliability outcomes.
  • - Scripting or automation capability, with the ability to reduce manual processes and improve operational efficiency.
Benefits
  • **What’s in it for you?**
  • You’ll join a purpose-driven company at a genuinely exciting stage of growth, with the opportunity to make a real impact on education at scale.
  • What we offer:
  • - A hybrid working environment, with teams spending three days a week in our Melbourne office.
  • - Learning and development opportunities, including a dedicated PD budget.
  • - 24/7 access to our Employee Assistance Program (EAP), including face-to-face, phone and live chat support.
  • - A parental leave program for both primary and secondary carers.
  • - A supportive, inclusive culture where your voice is valued and heard.
  • - The chance to grow alongside a fast-moving, ambitious organisation.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringPlatform EngineeringDevOpscloud infrastructureAWScontainer orchestrationinfrastructure managementautomationscriptingmonitoring and alerting
Soft Skills
calm under pressureownershipstructured problem-solvingincident responseoperational excellencepragmaticsystems-minded