Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Akamai Technologies

Senior Site Reliability Engineer

Akamai Technologies

Own operational reliability of cloud load balancing infrastructure serving global customers. Design and implement frameworks reflecting customer experience for reliability management.

Posted 6/11/2026full-timeRemote • 🇨🇦 CanadaSenior💰 CA$120,400 - CA$216,600 per yearWebsite

Tech Stack

Tools & technologies
AnsibleDistributed SystemsGoKubernetesLinuxPythonSaltStackTerraform

About the role

Key responsibilities & impact
  • Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
  • Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk
  • Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage
  • Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through
  • Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts
  • Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
  • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

Requirements

What you’ll need
  • 8+ years of experience in SRE, infrastructure engineering, or platform engineering, working with large-scale distributed systems
  • Demonstrate deep expertise with Linux networking fundamentals and diagnosing at the packet level using tcpdump, netstat, and similar tools
  • Have hands-on experience with L4/L7 load balancing technologies covering configuration, health checking, high availability, and failure modes at scale
  • Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
  • Demonstrate expertise in Kubernetes and containerization at scale including workload scheduling, networking, resource management, and operating stateful or network-intensive workloads in a cluster environment
  • Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and deployment safety instincts.

Benefits

Comp & perks
  • healthcare
  • RRSP
  • company holidays
  • vacation (in the form of PTO)
  • sick time
  • family friendly benefits including employee assistance program including a focus on mental and financial wellness

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREinfrastructure engineeringplatform engineeringLinux networkingload balancingSLO/SLI frameworksobservability platformsKubernetesPythonGo
Soft Skills
technical incident responseroot cause analysispreventive follow-throughactionable inputcapacity implicationsoperational risksteam-wide operational capability