
Site Reliability Engineer, SRE
tombola
full-time
Posted on:
Location Type: Hybrid
Location: Sunderland • United Kingdom
Visit company websiteExplore more
About the role
- Ensure critical systems are always reliable, available, and performing
- Implement smart automation, monitoring, and incident response strategies
- Lead incident management and root cause analysis
- Set up and maintain monitoring systems and alerting systems
- Optimize resource usage for scalability and performance
- Collaborate with development teams for reliability of new features
- Document infrastructure and procedures
Requirements
- Experienced SRE with a passion for building reliable, scalable, and efficient systems
- Strong knowledge in systems reliability and availability
- Proficient in monitoring systems like Dynatrace
- Familiar with incident management processes
- Experience in automation with tools like Terraform, Git, and TeamCity
- Skilled in performance optimization and capacity planning
- Knowledgeable in AWS cloud resources and disaster recovery plans
- Strong understanding of security best practices and compliance
- Excellent documentation skills
- Continuous improvement mindset
Benefits
- Flexible work arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
systems reliabilityavailabilityautomationperformance optimizationcapacity planningincident managementroot cause analysismonitoring systemsdisaster recoverysecurity best practices
Soft Skills
collaborationdocumentationcontinuous improvement mindset