Lead Site Reliability Engineer

Kraken

full-time

Posted on: 9/26/2025

Location: 🇺🇸 United States

✨ AI Apply

💰 $170,000 - $200,000 per year

Senior

AWSDistributed SystemsDockerKubernetesLinuxPostgresPythonRabbitMQRDBMSTerraform

About the role

Ensure the availability, performance, and scalability of products on Kraken's platform.
Own and lead the Product Reliability team: define strategic objectives, manage priorities, and deliver major initiatives on clear timelines.
Collaborate with the Staff Platform Engineer and wider Platform Engineering to deliver technical implementations and outcomes.
Line-manage engineers in the Product Reliability team: set performance expectations, review performance, and provide coaching and feedback.
Deliver technical improvements including small features and bug fixes; support service offerings owned by the team.
Support team delivery through code reviews, technology research, and architectural guidance.
Build a strong culture of open communication and an inclusive team environment.
Tackle interesting and difficult problems in the global energy market and drive continuous reliability improvements.

Excellent communication skills, working effectively with developers, product managers and other business stakeholders.
Record of successfully and consistently delivering critical path projects, on time and at scale.
Meticulous organisation and planning skills.
Experience of mentoring and coaching a team to perform at a high-level of quality.
Experience managing and supporting large-scale internet-facing distributed systems for millions of customers.
Good experience with AWS and a programming language.
Knowledge of security best-practices, security and CI/CD tooling, and methodologies.
Previous experience in leading technical delivery for small, highly-autonomous teams (helpful).
Previous experience as a technical individual contributor, preferably as a Site Reliability Engineer (helpful).
Track-record of effective collaboration with other teams and departments to drive holistic outcomes (helpful).
A proactive, innovative mindset with the ability to drive continuous improvement (helpful).
Previous experience working in a remote-first asynchronous global team (helpful).
Familiarity with PostgreSQL or similar RDBMS, Docker and Kubernetes (Amazon EKS), Python, Datadog, messaging queues/event-driven processing (RabbitMQ), Terraform, and experience with a Linux distribution.