tombola

Site Reliability Engineer, SRE

tombola

full-time

Posted on:

Location Type: Hybrid

Location: SunderlandUnited Kingdom

Visit company website

Explore more

AI Apply
Apply

About the role

  • Ensure critical systems are always reliable, available, and performing
  • Implement smart automation, monitoring, and incident response strategies
  • Lead incident management and root cause analysis
  • Set up and maintain monitoring systems and alerting systems
  • Optimize resource usage for scalability and performance
  • Collaborate with development teams for reliability of new features
  • Document infrastructure and procedures

Requirements

  • Experienced SRE with a passion for building reliable, scalable, and efficient systems
  • Strong knowledge in systems reliability and availability
  • Proficient in monitoring systems like Dynatrace
  • Familiar with incident management processes
  • Experience in automation with tools like Terraform, Git, and TeamCity
  • Skilled in performance optimization and capacity planning
  • Knowledgeable in AWS cloud resources and disaster recovery plans
  • Strong understanding of security best practices and compliance
  • Excellent documentation skills
  • Continuous improvement mindset
Benefits
  • Flexible work arrangements
  • Professional development opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
systems reliabilityavailabilityautomationperformance optimizationcapacity planningincident managementroot cause analysismonitoring systemsdisaster recoverysecurity best practices
Soft Skills
collaborationdocumentationcontinuous improvement mindset