
Site Reliability Engineer
Matillion
full-time
Posted on:
Location Type: Hybrid
Location: Manchester • United Kingdom
Visit company websiteExplore more
Salary
💰 £49,600 - £74,400 per year
About the role
- Engineering Reliability: Designing and implementing self-healing infrastructure using Kubernetes to maintain high uptime and system integrity.
- Scaling Cloud Ecosystems: Optimizing our cloud footprint (AWS/GCP/Azure) to ensure our platforms can handle rapid growth without breaking a sweat.
- Innovating with AI: Proactively identifying opportunities to integrate AI tools into our observability stack to automate incident detection and root-cause analysis.
- Eliminating Toil: Writing clean, efficient code to automate repetitive operational tasks, turning manual workflows into seamless "set and forget" processes.
- Defining Observability: Building advanced monitoring and alerting frameworks that provide deep insights into system health and performance.
Requirements
- Kubernetes Power User: Extensive experience managing production-grade K8s environments, including ingress, service mesh, and container security.
- Cloud Infrastructure Expert: A deep understanding of cloud networking, storage, and compute services within a major provider (AWS, Azure, or GCP).
- Proactive Mindset: An engineer who doesn't wait for a ticket; you naturally seek out system weaknesses and build solutions to strengthen them.
- AI Curiosity: An active interest in the AI landscape and a desire to leverage LLMs or machine learning to improve SRE workflows.
- Programming Literacy: Ideally experience with at least one language (such as Java, Python, Go, or Ruby) to bridge the gap between software engineering and operations.
Benefits
- Company Equity
- 30 days holiday + bank holidays
- 5 days paid volunteering leave
- Health insurance
- Life Insurance
- Pension
- Access to mental health support
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesAWSGCPAzureAI toolsincident detectionroot-cause analysisprogramming (Java, Python, Go, Ruby)monitoring frameworksalerting frameworks
Soft Skills
proactive mindsetproblem-solvingcuriosity