RELX

Senior SRE

RELX

full-time

Posted on:

Location Type: Office

Location: AlpharettaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $104,900 - $174,700 per year

Job Level

About the role

  • Own reliability and resilience outcomes for an internal AKS fleet used by multiple partner teams
  • Design, implement, and improve Kubernetes platform capabilities such as cluster lifecycle management, workload isolation, autoscaling, and safe multi tenancy
  • Lead and execute toil reduction initiatives through automation, self service workflows, and strong platform defaults
  • Build and evolve observability across metrics, logs, and traces, with a focus on distributed system dependencies and actionable signals
  • Improve incident response by automating detection, recovery, and mitigation to protect service levels
  • Participate in an on call rotation, act as an incident responder, and support others during high impact events
  • Contribute to SRE processes such as incident reviews, error budgets, and reliability planning using practical experience
  • Provide informal mentorship and technical guidance to junior SREs and engineers on partner teams
  • Collaborate with security, networking, and application teams to align platform standards and reduce cross team friction
  • Continuously identify opportunities to simplify architecture, reduce operational overhead, and optimize cloud cost

Requirements

  • Strong hands on experience operating Kubernetes in production, ideally Azure Kubernetes Service
  • Practical experience across core SRE practices such as monitoring, alerting, incident response, capacity planning, and automation
  • Solid understanding of distributed systems behavior, failure modes, and dependency management
  • Experience automating infrastructure and operations using tools such as Terraform, Helm, GitHub Actions
  • Proficiency with at least one programming or scripting language used for automation and tooling (Python, Bash)
  • Experience designing systems that favor reliability, simplicity, and clear ownership over ad hoc fixes
  • Comfort participating in on call rotations and leading or supporting incidents in a calm, structured way
  • Ability to influence without authority and work effectively with multiple partner teams
  • A mindset oriented toward root cause analysis, long term fixes, and continuous improvement
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesAzure Kubernetes Servicemonitoringalertingincident responsecapacity planningautomationTerraformHelmPython
Soft Skills
mentorshipcollaborationinfluence without authoritycalm incident managementroot cause analysiscontinuous improvement