Broadridge

Senior Site Reliability Engineer

Broadridge

full-time

Posted on:

Location Type: Hybrid

Location: NewarkCaliforniaNew JerseyUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $100,000 - $110,000 per year

Job Level

About the role

  • Design and implement high-availability, fault-tolerant architectures across on-prem and cloud platforms (AWS)
  • Lead multi-region DR planning, implementation, and testing, including RTO/RPO definition and validation
  • Define and enforce SLOs, SLIs, and error budgets to balance reliability with delivery velocity
  • Drive self-healing automation and proactive remediation strategies
  • Build and maintain infrastructure using Terraform and configuration management tools (e.g., Chef)
  • Develop automation to eliminate manual operational tasks (TOIL reduction)
  • Create reusable modules, pipelines, and guardrails for standardized deployments
  • Automate certificate lifecycle management, key rotation, and security updates
  • Design and implement end-to-end observability (metrics, logs, traces, synthetic monitoring)
  • Build dashboards, alerts, and runbooks to enable fast detection and resolution of incidents
  • Improve signal-to-noise ratio in alerting to reduce operational fatigue
  • Perform root cause analysis (RCA) and lead post-incident reviews with actionable follow-ups
  • Engineer and operate platforms on AWS, including services such as: EKS, EC2, RDS/Aurora, Lambda, API Gateway, CloudFront, WAF, ALB/NLB, CloudWatch, X-Ray, IAM, Secrets Manager
  • Lead cloud migrations and modernization initiatives, including legacy system refactoring
  • Implement secure networking patterns (VPCs, private subnets, controlled egress)
  • Identify and resolve performance bottlenecks through testing and analysis
  • Drive FinOps initiatives to optimize infrastructure cost without compromising reliability
  • Implement capacity planning and autoscaling strategies
  • Design and support CI/CD pipelines enabling safe, repeatable deployments
  • Embed reliability practices into the SDLC (testing, rollout strategies, rollback)
  • Partner with development teams to improve operability of applications before production
  • Partner with security and legal teams to meet regulatory and compliance requirements (e.g., data residency, GDPR-related controls)
  • Implement secure access controls, secrets management, and encryption best practices
  • Participate in security reviews, audits, and risk assessments
  • Act as a technical leader and mentor for engineers transitioning into SRE roles
  • Influence architecture and design decisions across multiple teams
  • Communicate effectively with engineering leadership, product owners, and non-technical stakeholders
  • Drive a culture of operational excellence, blameless postmortems, and continuous improvement

Requirements

  • 3+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Systems Engineering
  • Strong programming experience in Python, Java, or similar languages
  • Deep experience with Linux/Unix systems
  • Hands-on expertise with AWS and cloud-native architectures
  • Proven experience with Terraform and Infrastructure as Code
  • Strong understanding of networking, security, and distributed systems
  • Experience operating mission-critical, high-volume platforms
  • Preferred: Experience in financial services or highly regulated environments
  • Preferred: Experience with EKS/Kubernetes at scale
  • Preferred: Familiarity with Chaos Engineering and resilience testing
  • Preferred: Experience leading cloud cost optimization (FinOps) initiatives
Benefits
  • Bonus Eligible
  • Paid sick leave in compliance with the Colorado Healthy Families and Workplaces Act
  • Comprehensive benefit offerings available at www.broadridgebenefits.com
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringPlatform EngineeringDevOpsSystems EngineeringPythonJavaLinuxTerraformInfrastructure as CodeEKS
Soft Skills
leadershipcommunicationmentoringcollaborationproblem-solvingcontinuous improvementoperational excellenceinfluenceroot cause analysispost-incident reviews