Site Reliability Engineer

Epidemic Sound

Site Reliability Engineer at Epidemic Sound ensuring the platform's reliability and scalability while collaborating with product teams. Responsible for CI/CD, traffic management, and observability.

Posted 6/26/2026full-timeStockholm • 🇸🇪 SwedenMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

CloudDistributed SystemsFirewallsKubernetesLinuxTerraformUnix

About the role

Key responsibilities & impact

Build and operate the platform our services run on - GKE clusters, the controllers that extend them, and the Terraform that defines our cloud.
Own the path from commit to production - CI/CD, GitOps, and the progressive-delivery patterns that turn a merge into a safe release.
Strengthen the networking and routing layer - traffic management on top of the VPC, firewalls, and network policies that keep it safe and predictable.
Govern access and guardrails - IAM across every layer, policy-as-code, and break-glass paths - so teams move fast within safe defaults rather than waiting on tickets.
Grow reliability and observability - alert hygiene, runbooks, SLOs, and the metrics and tracing that show how the platform behaves in production.
Enable product teams and raise the bar - make production readiness the default, and drive healthy adoption of the standards and docs you would rather share than gatekeep.

Requirements

What you’ll need

Kubernetes fundamentals: a solid grasp of controllers, core components, and CNI and networking - depth in the domain matters more than any single tool (GKE a plus).
Infrastructure as code and delivery: Terraform, Helm or Kustomize, CI/CD and GitOps (ArgoCD), and the traffic-management and progressive-delivery mechanisms that move releases out safely.
Networking and access: routing fundamentals, the VPC, firewall, and network-policy primitives beneath it, and IAM and access management at different levels.
Operational depth: monitoring fundamentals (a clear view of when to reach for metrics versus tracing, and experience with an open-source observability stack), strong troubleshooting across distributed systems, and solid Unix/Linux.
Agentic development mindset: you use AI agents actively in your own work, knowing where they add leverage and where human judgement is non-negotiable.
Collaboration and judgement: you do your best work on large, cross-cutting projects, communicate openly, and stay opinionated but open to discussion - reaching for the right tool over your own creation.

Benefits

Comp & perks

Equal opportunity employer
We value diversity and encourage everyone to come and soundtrack the world with us.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesTerraformCI/CDGitOpstraffic managementnetworkingmonitoringtroubleshootingUnixLinux

Soft Skills

collaborationjudgementcommunicationtroubleshootingagentic development mindset