
Manager, Site Reliability Engineering
RELX
full-time
Posted on:
Location Type: Hybrid
Location: Raleigh • North Carolina, Pennsylvania • 🇺🇸 United States
Visit company websiteSalary
💰 $133,400 - $247,800 per year
Job Level
Mid-LevelSenior
Tech Stack
AnsibleAWSAzureCloudDockerEC2JavaKubernetes.NETPythonReactSplunkSQLSwiftTerraformTypeScriptVault
About the role
- Hire, mentor, and lead a high-performing, globally distributed team of SRE and DevOps engineers.
- Foster a culture of reliability, blameless postmortems, and continuous improvement.
- Build and sustain a global SRE community of practice that aligns reliability standards across business units.
- Drive cross-functional initiatives and influence enterprise-wide engineering practices.
- Define and implement SRE best practices to improve reliability, scalability, and performance.
- Establish and monitor key performance indicators (uptime, MTTR, SLO/SLI compliance).
- Serve as an escalation point for major incidents, ensuring swift resolution and actionable post-incident reviews.
- Partner with Product, Cloud Infrastructure, Security, and Architecture teams to ensure alignment with enterprise objectives.
- Collaborate with Cloud Engineering and Architecture to build robust monitoring, alerting, and observability systems.
- Lead modernization initiatives, including cloud migrations, IaC automation (Terraform, Kubernetes), and CI/CD pipeline improvements.
- Drive cloud cost efficiency and governance (FinOps).
- Ensure compliance with ISO 27001, NIST 800-53, and similar security frameworks.
- Define and implement SLOs, SLIs, and SLAs for AI/ML pipelines, APIs, and model training systems.
- Partner with AI/ML and Cloud teams to ensure the reliability, observability, and performance of AI workloads.
- Lead reliability engineering for MLOps — orchestration, IaC, monitoring, and automated scaling.
- Champion security, compliance, and fault tolerance across emerging AI platforms.
- Provide clear direction, feedback, and professional growth opportunities for team members.
- Encourage innovation, continuous learning, and adoption of new reliability and automation techniques.
- Lead with a global mindset, balancing local autonomy with enterprise alignment.
Requirements
- Bachelor’s degree in computer science, Engineering, or related field (advanced degree preferred).
- Experience as a Sr. SRE, platform engineering, or DevOps, including several years in a global leadership role.
- Proven experience leading distributed technical teams and aligning cross-functional stakeholders.
- Strong expertise in Azure and/or AWS, Kubernetes (EKS/AKS), Terraform, and CI/CD tooling.
- Background in observability, automation, incident management, and service reliability.
- Experience with AI/ML infrastructure (Databricks, MLflow, MLOps).
- Cloud & Infrastructure: Azure, AWS (EKS, EC2, S3, RDS, Lambda, Azure VMs, Functions)
- Infrastructure as Code: Terraform (modules, workspaces, policies), Ansible, ARM/BICEP/HCL, Spacelift
- Containers & Orchestration: Docker, Kubernetes, Helm, ArgoCD
- Monitoring & Observability: Datadog, Splunk, Coralogix, CloudWatch, Azure Monitor
- Automation & Scripting: Python, Bash, PowerShell, TypeScript
- Security & Networking: Azure Key Vault, HashiCorp Vault, cloud security best practices
- Programming Familiarity: Java, .NET/C#, SQL, React environments
- Empathetic and motivational leader who develops technical talent and fosters collaboration.
- Excellent communicator capable of engaging both technical and business stakeholders.
- Deep commitment to transparency, reliability culture, and continuous improvement.
Benefits
- Comprehensive, multi-carrier health plan benefits
- Disability insurance
- Dependent care and commuter spending accounts
- Life and accident insurance
- Retirement benefits (salary investment plan/employer stock purchase plan)
- Modern family benefits, including adoption and surrogacy
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
SREDevOpscloud migrationsIaCmonitoringobservabilityautomationincident managementAI/ML infrastructureservice reliability
Soft skills
leadershipmentoringcollaborationcommunicationinnovationcontinuous improvementempathymotivationtransparencyteam development
Certifications
Bachelor's degreeadvanced degree preferredISO 27001NIST 800-53