
Lead Site Reliability Engineer
Empower
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $114,000 - $165,300 per year
Job Level
About the role
- Lead cross-functional reliability initiatives across multiple value streams and coordinate execution across teams.
- Define and evolve SRE best practices, tools, and methodologies across the organization.
- Architect enterprise-scale, multi-region AWS infrastructure that balances reliability, cost, performance, and security.
- Establish and operate SLOs, SLIs, and error budgets for critical services, using them to drive prioritization decisions.
- Serve as incident commander for major incidents and drive postmortems that produce completed action items and organizational learning.
- Lead disaster recovery planning for critical financial services infrastructure.
- Build shared Infrastructure as Code foundations in Terraform (reusable modules, standards, and patterns adopted across teams).
- Design and implement production-scale Kubernetes patterns, including multi-tenancy, security policies, and advanced scheduling.
- Establish observability standards and strategies using Datadog and Splunk (metrics, logging, tracing, dashboards, and alerting).
- Set CI/CD standards and patterns, including pipeline-as-code and progressive delivery at scale.
- Lead chaos engineering, game days, and systematic reliability testing initiatives.
- Drive FinOps initiatives to optimize cloud spend while maintaining reliability targets.
- Lead a functional team of SREs (without direct reports) on projects and operational initiatives.
- Mentor SREs at multiple levels through coaching, design reviews, code reviews, and training sessions.
- Partner with Engineering, Product, and Security leadership to align reliability work with business priorities, zero-trust architecture, and compliance controls.
Requirements
- Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
- 7 to 10 years of Site Reliability Engineering experience (or equivalent), with demonstrated technical leadership.
- Proven ability to lead technical teams and drive complex projects to completion.
- Expert AWS knowledge, including designing large-scale, multi-region architectures.
- Deep Kubernetes expertise, including advanced features, security, and production-scale operations.
- Mastery of Infrastructure as Code using Terraform, including building shared platforms and frameworks.
- Strong software engineering background with production experience in Python and/or Go.
- Extensive experience with observability platforms (Datadog, Splunk) and implementing monitoring at scale.
- Deep understanding of CI/CD principles and experience implementing enterprise-grade pipelines.
- Proven track record leading major incidents and conducting effective postmortems.
- Strong understanding of security, networking, and infrastructure design patterns.
- Strong communication skills with ability to explain complex technical concepts to diverse audiences.
- Experience mentoring engineers and building technical capabilities in teams.
Benefits
- Medical, dental, vision and life insurance
- Retirement savings – 401(k) plan with generous company matching contributions (up to 6%), financial advisory services, potential company discretionary contribution, and a broad investment lineup
- Tuition reimbursement up to $5,250/year
- Business-casual environment that includes the option to wear jeans
- Generous paid time off upon hire – including a paid time off program plus ten paid company holidays and three floating holidays each calendar year
- Paid volunteer time — 16 hours per calendar year
- Leave of absence programs – including paid parental leave, paid short- and long-term disability, and Family and Medical Leave (FMLA)
- Business Resource Groups (BRGs) – BRGs facilitate inclusion and collaboration across our business internally and throughout the communities where we live, work and play. BRGs are open to all.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringAWSKubernetesTerraformPythonGoCI/CDobservabilityFinOpschaos engineering
Soft Skills
technical leadershipcommunicationmentoringproject managementcollaboration