Kraken

Senior Platform Engineer – Observability

Kraken

full-time

Posted on:

Location Type: Hybrid

Location: TokyoJapan

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Support and implement monitoring and alerting strategy across Kraken’s customer business
  • Define and uphold observability best practices across multiple products and platforms
  • Partner with product teams to implement observability tooling and improve reliability across the organisation
  • Help product teams build best-in-class dashboards for their requirements or bespoke use cases
  • Work with product teams to define and implement meaningful Service Level Objectives (SLOs) and Service Level Indicators (SLIs), aligned to contractual Service Level Agreements (SLAs)
  • Build, tune, and continuously improve alerts and monitors using golden signals (latency, traffic, errors, saturation) as a framework - reducing noise and increasing actionable signal
  • Help product teams transition to on-call models by improving signals, alert quality, and operational readiness
  • Improve tooling and self-service capabilities for alerting and monitoring across multiple product teams
  • Analyse incident metrics to identify trends and improvement opportunities, communicating insights clearly back to product teams
  • Manage the cost and usage of our observability tooling stack in collaboration with FinOps
  • Contribute to broader platform reliability infrastructure improvements where needed
  • Help solve interesting and difficult problems - there’s a significant opportunity for disruption in the global energy market

Requirements

  • Solid hands-on experience across our core platform stack:
  • - AWS (supporting and improving cloud infrastructure used by product teams)
  • - Terraform (infrastructure as code; comfortable operating with Terraform day-to-day)
  • - Kubernetes (container orchestration and deployment management; comfortable working with Kubernetes day-to-day)
  • - Experience using industry-standard observability tooling - we use Datadog, Grafana, Prometheus and Rootly (experience with other monitoring/alerting platforms is transferable)
  • - Strong collaboration and communication skills - able to work effectively with developers, product managers, and other stakeholders to design and deliver impactful observability "golden paths" and monitoring experiences
  • - Exposure to Python (or a similar C-based language like TypeScript, Go, C#) - able to understand how applications behave in production to support observability and reliability improvements
  • - Previous experience working in small, highly autonomous teams
  • - Comfortable with ambiguity and able to create structure in unclear situations
  • - Proactive learning mindset (experiment, iterate, and adapt as the team evolves approaches)
  • - Strong asynchronous written communication (Slack/Notion/docs) and a habit of keeping others in the loop
  • - Autonomy and accountability - making progress independently and owning outcomes
Benefits
  • Health insurance
  • Flexible working arrangements
  • Professional development opportunities
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AWSTerraformKubernetesDatadogGrafanaPrometheusRootlyPythonTypeScriptGo
Soft Skills
collaborationcommunicationproactive learning mindsetautonomyaccountabilityability to work with ambiguitystrong asynchronous written communicationproblem-solvingstakeholder engagementteamwork