Empower

Site Reliability Engineer

Empower

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $87,400 - $123,400 per year

About the role

  • Own and improve the reliability, stability, scalability, and performance of our core data platforms and services
  • Provide operational support for large-scale, distributed data systems, ensuring high availability and strong SLAs
  • Partner closely with full-stack, data, and platform engineering teams to deliver continuous improvements
  • Operate and support EMR and EMR Serverless (Python/Spark) workloads and data pipelines
  • Support and optimize Amazon Redshift and DynamoDB in high-throughput, production environments
  • Design, build, and evolve monitoring, alerting, and observability frameworks with a focus on symptoms, not just outages
  • Lead incident response, troubleshooting production issues across the full stack and coordinating with internal and external stakeholders
  • Perform root cause analysis (RCA) and readiness reviews; turn findings into durable fixes and automation
  • Create and maintain runbooks, SOPs, and operational documentation
  • Collaborate with engineering teams to optimize performance, reliability, and cost
  • Participate in an on-call rotation to respond to incidents impacting customer-facing systems
  • Recommend and influence the use of AWS managed services and architectural patterns
  • Continuously evaluate system performance, capacity, and cost to scale efficiently

Requirements

  • 4–6 years of experience building or operating systems across multiple architecture domains: application, data, integration, infrastructure, and security
  • 4+ years of hands-on AWS experience, with strong production exposure to several of the following: Redshift, DynamoDB, EMR, EMR Serverless, EC2, S3 Lambda, Step Functions, EventBridge, RDS, IAM
  • Proven experience operating data platforms such as data lakes and data warehouses in production
  • Strong SQL skills and experience working with modern databases (e.g., Redshift, DynamoDB, Postgres, MySQL, Oracle)
  • 4+ years of Python experience, including scripting, automation, or data workloads
  • Experience with CloudWatch, infrastructure monitoring, and alerting
  • Hands-on experience with incident management, uptime SLAs, and customer-impacting systems
  • Strong understanding of Git-based workflows (GitHub, Git Flow, or similar)
  • Experience working in Agile environments (Scrum / Kanban) using tools such as Jira and Confluence
  • Bachelor’s in Computer Science, Information Systems, Data/Analytics, or related; equivalent practical experience welcomed.
Benefits
  • Medical, dental, vision and life insurance
  • Retirement savings – 401(k) plan with generous company matching contributions (up to 6%)
  • Tuition reimbursement up to $5,250/year
  • Business-casual environment that includes the option to wear jeans
  • Generous paid time off upon hire – including a paid time off program plus ten paid company holidays and three floating holidays each calendar year
  • Paid volunteer time — 16 hours per calendar year
  • Leave of absence programs – including paid parental leave, paid short- and long-term disability, and Family and Medical Leave (FMLA)
  • Business Resource Groups (BRGs) – BRGs facilitate inclusion and collaboration across our business internally and throughout the communities where we live, work and play.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonSQLAWSRedshiftDynamoDBEMREMR ServerlessCloudWatchGitAgile
Soft Skills
incident managementtroubleshootingcollaborationcommunicationproblem-solvingroot cause analysisdocumentationleadershiporganizational skillscustomer focus