Empower

Lead Site Reliability Engineer

Empower

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $114,000 - $165,300 per year

Job Level

About the role

  • Lead cross-functional reliability initiatives across multiple value streams and coordinate execution across teams.
  • Define and evolve SRE best practices, tools, and methodologies across the organization.
  • Architect enterprise-scale, multi-region AWS infrastructure that balances reliability, cost, performance, and security.
  • Establish and operate SLOs, SLIs, and error budgets for critical services, using them to drive prioritization decisions.
  • Serve as incident commander for major incidents and drive postmortems that produce completed action items and organizational learning.
  • Lead disaster recovery planning for critical financial services infrastructure.
  • Build shared Infrastructure as Code foundations in Terraform (reusable modules, standards, and patterns adopted across teams).
  • Design and implement production-scale Kubernetes patterns, including multi-tenancy, security policies, and advanced scheduling.
  • Establish observability standards and strategies using Datadog and Splunk (metrics, logging, tracing, dashboards, and alerting).
  • Set CI/CD standards and patterns, including pipeline-as-code and progressive delivery at scale.
  • Lead chaos engineering, game days, and systematic reliability testing initiatives.
  • Drive FinOps initiatives to optimize cloud spend while maintaining reliability targets.
  • Lead a functional team of SREs (without direct reports) on projects and operational initiatives.
  • Mentor SREs at multiple levels through coaching, design reviews, code reviews, and training sessions.
  • Partner with Engineering, Product, and Security leadership to align reliability work with business priorities, zero-trust architecture, and compliance controls.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
  • 7 to 10 years of Site Reliability Engineering experience (or equivalent), with demonstrated technical leadership.
  • Proven ability to lead technical teams and drive complex projects to completion.
  • Expert AWS knowledge, including designing large-scale, multi-region architectures.
  • Deep Kubernetes expertise, including advanced features, security, and production-scale operations.
  • Mastery of Infrastructure as Code using Terraform, including building shared platforms and frameworks.
  • Strong software engineering background with production experience in Python and/or Go.
  • Extensive experience with observability platforms (Datadog, Splunk) and implementing monitoring at scale.
  • Deep understanding of CI/CD principles and experience implementing enterprise-grade pipelines.
  • Proven track record leading major incidents and conducting effective postmortems.
  • Strong understanding of security, networking, and infrastructure design patterns.
  • Strong communication skills with ability to explain complex technical concepts to diverse audiences.
  • Experience mentoring engineers and building technical capabilities in teams.
Benefits
  • Medical, dental, vision and life insurance
  • Retirement savings – 401(k) plan with generous company matching contributions (up to 6%), financial advisory services, potential company discretionary contribution, and a broad investment lineup
  • Tuition reimbursement up to $5,250/year
  • Business-casual environment that includes the option to wear jeans
  • Generous paid time off upon hire – including a paid time off program plus ten paid company holidays and three floating holidays each calendar year
  • Paid volunteer time — 16 hours per calendar year
  • Leave of absence programs – including paid parental leave, paid short- and long-term disability, and Family and Medical Leave (FMLA)
  • Business Resource Groups (BRGs) – BRGs facilitate inclusion and collaboration across our business internally and throughout the communities where we live, work and play. BRGs are open to all.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringAWSKubernetesTerraformPythonGoCI/CDobservabilityFinOpschaos engineering
Soft Skills
technical leadershipcommunicationmentoringproject managementcollaboration