FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Site Reliability Engineer
Epidemic SoundSite Reliability Engineer at Epidemic Sound ensuring the platform's reliability and scalability while collaborating with product teams. Responsible for CI/CD, traffic management, and observability.
Tech Stack
Tools & technologiesCloudDistributed SystemsFirewallsKubernetesLinuxTerraformUnix
About the role
Key responsibilities & impact- Build and operate the platform our services run on - GKE clusters, the controllers that extend them, and the Terraform that defines our cloud.
- Own the path from commit to production - CI/CD, GitOps, and the progressive-delivery patterns that turn a merge into a safe release.
- Strengthen the networking and routing layer - traffic management on top of the VPC, firewalls, and network policies that keep it safe and predictable.
- Govern access and guardrails - IAM across every layer, policy-as-code, and break-glass paths - so teams move fast within safe defaults rather than waiting on tickets.
- Grow reliability and observability - alert hygiene, runbooks, SLOs, and the metrics and tracing that show how the platform behaves in production.
- Enable product teams and raise the bar - make production readiness the default, and drive healthy adoption of the standards and docs you would rather share than gatekeep.
Requirements
What you’ll need- Kubernetes fundamentals: a solid grasp of controllers, core components, and CNI and networking - depth in the domain matters more than any single tool (GKE a plus).
- Infrastructure as code and delivery: Terraform, Helm or Kustomize, CI/CD and GitOps (ArgoCD), and the traffic-management and progressive-delivery mechanisms that move releases out safely.
- Networking and access: routing fundamentals, the VPC, firewall, and network-policy primitives beneath it, and IAM and access management at different levels.
- Operational depth: monitoring fundamentals (a clear view of when to reach for metrics versus tracing, and experience with an open-source observability stack), strong troubleshooting across distributed systems, and solid Unix/Linux.
- Agentic development mindset: you use AI agents actively in your own work, knowing where they add leverage and where human judgement is non-negotiable.
- Collaboration and judgement: you do your best work on large, cross-cutting projects, communicate openly, and stay opinionated but open to discussion - reaching for the right tool over your own creation.
Benefits
Comp & perks- Equal opportunity employer
- We value diversity and encourage everyone to come and soundtrack the world with us.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesTerraformCI/CDGitOpstraffic managementnetworkingmonitoringtroubleshootingUnixLinux
Soft Skills
collaborationjudgementcommunicationtroubleshootingagentic development mindset