
Senior Platform Engineer – Observability
Kraken
full-time
Posted on:
Location Type: Hybrid
Location: Tokyo • Japan
Visit company websiteExplore more
Job Level
About the role
- Support and implement monitoring and alerting strategy across Kraken’s customer business
- Define and uphold observability best practices across multiple products and platforms
- Partner with product teams to implement observability tooling and improve reliability across the organisation
- Help product teams build best-in-class dashboards for their requirements or bespoke use cases
- Work with product teams to define and implement meaningful Service Level Objectives (SLOs) and Service Level Indicators (SLIs), aligned to contractual Service Level Agreements (SLAs)
- Build, tune, and continuously improve alerts and monitors using golden signals (latency, traffic, errors, saturation) as a framework - reducing noise and increasing actionable signal
- Help product teams transition to on-call models by improving signals, alert quality, and operational readiness
- Improve tooling and self-service capabilities for alerting and monitoring across multiple product teams
- Analyse incident metrics to identify trends and improvement opportunities, communicating insights clearly back to product teams
- Manage the cost and usage of our observability tooling stack in collaboration with FinOps
- Contribute to broader platform reliability infrastructure improvements where needed
- Help solve interesting and difficult problems - there’s a significant opportunity for disruption in the global energy market
Requirements
- Solid hands-on experience across our core platform stack:
- - AWS (supporting and improving cloud infrastructure used by product teams)
- - Terraform (infrastructure as code; comfortable operating with Terraform day-to-day)
- - Kubernetes (container orchestration and deployment management; comfortable working with Kubernetes day-to-day)
- - Experience using industry-standard observability tooling - we use Datadog, Grafana, Prometheus and Rootly (experience with other monitoring/alerting platforms is transferable)
- - Strong collaboration and communication skills - able to work effectively with developers, product managers, and other stakeholders to design and deliver impactful observability "golden paths" and monitoring experiences
- - Exposure to Python (or a similar C-based language like TypeScript, Go, C#) - able to understand how applications behave in production to support observability and reliability improvements
- - Previous experience working in small, highly autonomous teams
- - Comfortable with ambiguity and able to create structure in unclear situations
- - Proactive learning mindset (experiment, iterate, and adapt as the team evolves approaches)
- - Strong asynchronous written communication (Slack/Notion/docs) and a habit of keeping others in the loop
- - Autonomy and accountability - making progress independently and owning outcomes
Benefits
- Health insurance
- Flexible working arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AWSTerraformKubernetesDatadogGrafanaPrometheusRootlyPythonTypeScriptGo
Soft Skills
collaborationcommunicationproactive learning mindsetautonomyaccountabilityability to work with ambiguitystrong asynchronous written communicationproblem-solvingstakeholder engagementteamwork