Salary
💰 $170,000 - $200,000 per year
Tech Stack
AWSDistributed SystemsDockerKubernetesLinuxPostgresPythonRabbitMQRDBMSTerraform
About the role
- Ensure the availability, performance, and scalability of products on Kraken's platform.
- Own and lead the Product Reliability team: define strategic objectives, manage priorities, and deliver major initiatives on clear timelines.
- Collaborate with the Staff Platform Engineer and wider Platform Engineering to deliver technical implementations and outcomes.
- Line-manage engineers in the Product Reliability team: set performance expectations, review performance, and provide coaching and feedback.
- Deliver technical improvements including small features and bug fixes; support service offerings owned by the team.
- Support team delivery through code reviews, technology research, and architectural guidance.
- Build a strong culture of open communication and an inclusive team environment.
- Tackle interesting and difficult problems in the global energy market and drive continuous reliability improvements.
Requirements
- Excellent communication skills, working effectively with developers, product managers and other business stakeholders.
- Record of successfully and consistently delivering critical path projects, on time and at scale.
- Meticulous organisation and planning skills.
- Experience of mentoring and coaching a team to perform at a high-level of quality.
- Experience managing and supporting large-scale internet-facing distributed systems for millions of customers.
- Good experience with AWS and a programming language.
- Knowledge of security best-practices, security and CI/CD tooling, and methodologies.
- Previous experience in leading technical delivery for small, highly-autonomous teams (helpful).
- Previous experience as a technical individual contributor, preferably as a Site Reliability Engineer (helpful).
- Track-record of effective collaboration with other teams and departments to drive holistic outcomes (helpful).
- A proactive, innovative mindset with the ability to drive continuous improvement (helpful).
- Previous experience working in a remote-first asynchronous global team (helpful).
- Familiarity with PostgreSQL or similar RDBMS, Docker and Kubernetes (Amazon EKS), Python, Datadog, messaging queues/event-driven processing (RabbitMQ), Terraform, and experience with a Linux distribution.