
Senior SRE
RELX
full-time
Posted on:
Location Type: Office
Location: Alpharetta • United States
Visit company websiteExplore more
Salary
💰 $104,900 - $174,700 per year
Job Level
About the role
- Own reliability and resilience outcomes for an internal AKS fleet used by multiple partner teams
- Design, implement, and improve Kubernetes platform capabilities such as cluster lifecycle management, workload isolation, autoscaling, and safe multi tenancy
- Lead and execute toil reduction initiatives through automation, self service workflows, and strong platform defaults
- Build and evolve observability across metrics, logs, and traces, with a focus on distributed system dependencies and actionable signals
- Improve incident response by automating detection, recovery, and mitigation to protect service levels
- Participate in an on call rotation, act as an incident responder, and support others during high impact events
- Contribute to SRE processes such as incident reviews, error budgets, and reliability planning using practical experience
- Provide informal mentorship and technical guidance to junior SREs and engineers on partner teams
- Collaborate with security, networking, and application teams to align platform standards and reduce cross team friction
- Continuously identify opportunities to simplify architecture, reduce operational overhead, and optimize cloud cost
Requirements
- Strong hands on experience operating Kubernetes in production, ideally Azure Kubernetes Service
- Practical experience across core SRE practices such as monitoring, alerting, incident response, capacity planning, and automation
- Solid understanding of distributed systems behavior, failure modes, and dependency management
- Experience automating infrastructure and operations using tools such as Terraform, Helm, GitHub Actions
- Proficiency with at least one programming or scripting language used for automation and tooling (Python, Bash)
- Experience designing systems that favor reliability, simplicity, and clear ownership over ad hoc fixes
- Comfort participating in on call rotations and leading or supporting incidents in a calm, structured way
- Ability to influence without authority and work effectively with multiple partner teams
- A mindset oriented toward root cause analysis, long term fixes, and continuous improvement
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesAzure Kubernetes Servicemonitoringalertingincident responsecapacity planningautomationTerraformHelmPython
Soft Skills
mentorshipcollaborationinfluence without authoritycalm incident managementroot cause analysiscontinuous improvement