Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Akamai Technologies

Senior Site Reliability Engineer – Cloud and Networking

Akamai Technologies

Senior Site Reliability Engineer responsible for reliability of Akamai load balancing infrastructure. Designing SLO frameworks and leading incident management while mentoring junior engineers.

Posted 5/28/2026full-timeRemote • 🇵🇱 PolandSeniorWebsite

Tech Stack

Tools & technologies
AnsibleCloudDistributed SystemsGoGrafanaKubernetesLinuxPrometheusPythonSaltStackTerraform

About the role

Key responsibilities & impact
  • Owning the SRE lifecycle for NodeBalancer and Network Load Balancer — from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
  • Designing and implementing SLO/SLI frameworks that reflect true customer experience for L4 and L7 load balancing services, and driving action when error budgets are at risk
  • Building and maintaining observability pipelines for NB/NLB infrastructure, including Prometheus metrics from load balancing components and system-level sources, and Grafana dashboards that enable rapid incident triage
  • Leading technical incident response for complex NB/NLB failures — BGP/VIP issues, failover failures, data plane degradations, and configuration problems — acting as the technical commander and driving root cause analysis and preventive follow-through
  • Developing and automating safe deployment workflows for phased NB/NLB releases, including bake period monitoring, feature flag management, and GO/NO-GO validation across global datacenter rollouts
  • Reviewing design documents, product requirement Documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
  • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability
  • Mentoring SRE II engineers on the NB team, providing hands-on technical guidance, code/config reviews, and raising the bar for the team's SRE practice
  • Participating in an on-call rotation for NB/NLB production systems, responding to incidents and driving resolution for customer-facing load balancing infrastructure
  • Participate in a scheduled, daytime-only on-call rotation to spearhead technical incident response and resolve complex NB/NLB failures.

Requirements

What you’ll need
  • Have extensive experience in SRE, platform engineering, or infrastructure engineering, working with large-scale distributed systems
  • Demonstrate deep expertise with Linux networking fundamentals — routing, BGP, nftables/iptables, ARP, VXLAN — and comfort diagnosing at the packet level using tcpdump, netstat, and similar tools
  • Have hands-on experience with L4/L7 load balancing technologies — including proxy-based or kernel-level load balancers — covering configuration, health checking, high availability, and failure modes at scale
  • Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
  • Demonstrate expertise in Kubernetes and containerization at scale — including workload scheduling, networking (CNI, Services, ingress), resource management, and operating stateful or network-intensive workloads in a cluster environment
  • Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and strong deployment safety instincts
  • Demonstrate 4+ years in SRE or infrastructure engineering, with at least 2 years at cloud scale

Benefits

Comp & perks
  • Your health
  • Your finances
  • Your family
  • Your time at work
  • Your time pursuing other endeavors

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREplatform engineeringinfrastructure engineeringLinux networkingBGPload balancingKubernetesPythonGoinfrastructure-as-code
Soft Skills
mentoringincident managementtechnical guidanceroot cause analysiscommunication