FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSAzureCloudDockerGrafanaKubernetesPrometheusSplunk
About the role
Key responsibilities & impact- Lead, mentor, and develop a team of L2/L3 Production Support Engineers.
- Define, track, and optimise key operational metrics including Mean Time to Resolution (MTTR), Mean Time Between Failures (MTBF).
- Lead the diagnosis and resolution of application-level issues using software development techniques and best practices.
- Establish and refine incident management processes. Lead critical incident resolution and coordinate cross-functional response efforts.
- Champion rigorous RCA practices across the team.
- Identify opportunities to streamline support workflows, reduce manual effort through automation, and eliminate toil.
- Serve as the primary technical contact for internal and external stakeholders.
- Maintain oversight of production SaaS platforms, infrastructure stability, and system performance.
Requirements
What you’ll need- Minimum 6+ years in production support, DevOps, or Site Reliability Engineering roles, with at least 3 years leading or mentoring technical teams.
- Proven experience troubleshooting application code issues using software development techniques: debuggers, profilers, log analysis, code review, and systematic problem-solving methodologies.
- Demonstrated expertise building and scaling metrics-driven teams. Evidence of implementing or improving MTTR, MTBF, or similar KPIs with measurable results.
- Strong background supporting SaaS/cloud-native production systems in high-availability, high-traffic environments.
- Hands-on experience with incident management frameworks (ITIL, blameless post-mortems, RCA methodologies).
- Excellent communication skills: ability to distil technical complexity for non-technical stakeholders and present data-backed insights to leadership.
- Experience with containerized systems (Docker, Kubernetes) or cloud platforms (AWS, Azure, Google Cloud) is a plus.
- Familiarity with observability tools (Datadog, New Relic, Splunk, Prometheus/Grafana) and APM instrumentation is a plus.
- Background in software development or systems engineering (demonstrable coding ability in at least one language) is a plus.
- Formal leadership or management certification; Agile/Scrum experience is a plus.
- Located in or willing to work from the York, UK office.
Benefits
Comp & perks- To learn more about our values, mission and the wide-range of perks offered to employees at Comply, visit https://www.comply.com/careers/.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
production supportDevOpsSite Reliability Engineeringtroubleshootingdebuggersprofilerslog analysiscode reviewsystematic problem-solvingmetrics-driven teams
Soft Skills
leadershipmentoringcommunicationincident managementproblem-solvingcollaborationdata presentationstakeholder engagementprocess optimizationteam development
Certifications
formal leadership certificationAgile certificationScrum certification
