Incident Response: Respond to incidents and alerts, triaging urgency and investigating root cause
Documentation: Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
Root Cause Analysis: Contribute to post-mortems and help identify long-term improvements under guidance
Collaboration: Support cross-functional teams during investigations and post-incident reviews
Observability: Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
Automation: Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
Development: Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
System Reliability: Work on efforts to enhance the reliability and performance of our application and systems, ensuring optimal uptime and minimal disruptions.
Infrastructure Optimization: Work closely with the development and platform engineering teams to optimize the infrastructure on AWS, ensuring scalability and efficiency.
Requirements
A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems.
Familiarity with at least one scripting language such as Ruby, JavaScript, Python, Bash
Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
An eagerness to follow modern engineering practices and learn from others
Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
A collaborative mindset with clear communication skills
Willing to ask questions to gain a better understanding of new or complex concepts
Benefits
Share Options
20 days of PTO per year + public holidays
3 volunteer days to use for any charitable/voluntary cause you would like.
A top-tier private health insurance package.
401k contribution plan
Work from home stipend
A personal learning and development budget through Learnerbly. You’ll be supported in your quest for knowledge, whatever that looks like to you.
Fully subsidised therapy sessions to subscriptions to leading wellbeing platforms.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
incident responseroot cause analysisautomation scriptingscripting languagescontainerizationinfrastructure as codeobservabilitynetworking conceptssystem reliabilityperformance optimization
Soft skills
problem solvingcollaborationcommunicationeagerness to learncuriosity