
Technical Lead, Application Support
Spektrix
full-time
Posted on:
Location Type: Hybrid
Location: London • United Kingdom
Visit company websiteExplore more
Salary
💰 £65,000 - £80,000 per year
Job Level
About the role
- Maintaining the configuration and accuracy of the team's operational dashboards, alerts, and PagerDuty schedules. Identifying and documenting observability gaps across the wider platform for resolution by engineering delivery teams.
- Apply and track usage of known workarounds and influence the rest of the engineering team on improvements needed for the reliability and quality of the product.
- Ensuring problems and tasks are investigated thoroughly and solved accurately and methodically.
- Responding to incidents in a timely way, in line with our processes.
- Keeping our how-to guides and documentation up-to-date and concise.
- Keep the team working in line with our security and compliance policies and processes, particularly when working with customer data and production systems.
- Ensuring the principles of our technical strategy are embedded into our solutions and ways of working.
- Identify and document areas for improvement in the reliability, scalability and quality of Spektrix systems.
- Documenting, reporting, resolving, and mitigating defects, problems, risks, and instances of nonconformance.
- Continuously improving how we document, investigate, and triage issues.
- Sharing what we learn through dashboards, incident reviews, updated documentation, and collaborative work such as coaching.
- Seeking opportunities to automate things and collaborate on internal improvement projects.
- Applying Lean principles, and using analysis and data to pinpoint where things are getting stuck. Identify opportunities for eliminating waste and delivering more effectively and efficiently.
- Collaborating with Product, Engineering, and our First-Line Support teams to make sure we are prioritising the right things.
- Contribute to platform resilience strategies such as capacity planning, redundancy, failover, and disaster recovery.
- Ensuring the accuracy, relevance, and usefulness of our alerts, monitoring, and observability.
- Participate in or lead post-incident reviews, and identify required actions.
- Design and maintain operational runbooks and readiness checklists.
- On a typical day, you'll be working closely with colleagues pairing in a virtual meeting room, collaborating on items from the team's Kanban board and identifying areas for improvement.
- We review incoming work requests together to understand their context and urgency. Using self-organising principles, the team decides how to divide the work - whether pairing, mobbing, or working solo - based on what's most effective.
- If Clients are putting high-demand tickets on-sale today, you may need to scale cloud resources to ensure everything runs smoothly, and put everything back in place after it’s over.
- You’ll lead a range of activities including discovery, investigation and spikes, writing or refining tickets, fault-finding and fixing, testing, documentation, and build and release tasks.
- Throughout the day, you’ll monitor alerts and investigate any that arise. If needed, you may join the Incident Room alongside a small group of cross-functional colleagues to calmly and methodically identify and resolve issues. This is done in close collaboration with customer-facing teams to ensure clarity and continuity.
- At other times, you'll participate in team sessions focused on reflecting, planning, and finding ways to improve how we work together and deliver on our goals.
Requirements
- Experience of leading an operations or support team, monitoring and supporting Azure-hosted SaaS applications.
- Technical and Operational good practice and excellence.
- Aligning teams with goals
- Stakeholder management
- Deep understanding of SQL Server and/or Azure SQL, and database performance.
- Experience querying logs using query languages such as KQL, LogQL, Lucene, etc...
- Able to read and interpret logs and stack traces from C# .NET applications.
- Experience with Infrastructure-as-Code (specifically Terraform). While you will not be writing C# feature code, you must be able to read and navigate code to diagnose errors effectively.
- Experience with a range of alerting, performance, monitoring and security tools. We use Azure Monitor, PagerDuty, Grafana, Logz.io, and Cloudflare tools; experience with these particular tools is not essential, but similar experience is essential.
- Can calmly, confidently, and competently co-ordinate incident response; clearly communicating accurate, timely, and relevant information to a range of stakeholders across the organisation.
- Communicate fluently with engineers as well as client success teams, and build relationships with stakeholders.
- Can break down and document complicated technical concepts concisely.
- Highly collaborative; Work closely with other leads and managers in the team leadership group.
- Giving and receiving feedback in an honest, kind, and reflective manner. Learning from mistakes and being imaginative about ways to improve things.
- Coaching and mentoring your team.
- Curious, and keen to learn new skills and technologies.
Benefits
- Flexible working with support for WFH set up.
- NHS top up scheme (covering dental, optical, therapy & counselling, prescription and other health related costs)
- Continuous development supported by Line manager, learning budget
- Enhanced Maternity, Adoption & Shared Parental Leave
- 35 days paid leave annually, inclusive of annual leave, bank holidays and a Birthday day off, all able to use flexibly
- 4 weeks paid sabbatical after 5 years of service
- 2 volunteering days per year
- Company pension scheme of 4%
- Free snacks, drinks and breakfast items in all our offices
- Varied range of regular socials across all our offices
- Cycle to work & Season Ticket Loans
- Travel stipend for commuting
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SQL ServerAzure SQLInfrastructure-as-CodeTerraformKQLLogQLLuceneC#.NETmonitoring
Soft Skills
stakeholder managementteam leadershipcommunicationcollaborationcoachingmentoringproblem-solvingfeedbackcuriosityadaptability