
API Reliability Engineer
Empower
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $87,400 - $123,400 per year
About the role
- Own and improve the reliability, performance, and scalability of API services in production.
- Troubleshoot and resolve P1/P2 production incidents end-to-end, analyzing issues across application, infrastructure, and integrations.
- Work closely with API developers to identify and address reliability issues and application-level security vulnerabilities in service design and implementation.
- Contribute targeted code-level or configuration fixes to resolve issues and prevent recurrence.
- Participate in root cause analysis (RCA) and drive durable, long-term fixes.
- Improve API resilience through patterns such as timeouts, retries, circuit breakers, and graceful degradation.
- Establish and enhance observability and service health metrics, including logs, metrics, traces, and SLOs, using Datadog and Splunk.
- Define and monitor SLAs/SLOs for API performance and availability.
- Work with API Gateway and ALB/NLB for traffic management, routing, and system reliability.
- Contribute to CI/CD pipelines using Jenkins to ensure safe and consistent deployments.
- Contribute to disaster recovery readiness and system resilience planning.
- Collaborate across engineering teams to improve system design and operational readiness.
- Participate in an on-call rotation for critical incidents (P1/P2).
Requirements
- Minimum 5 years of experience in backend or API development
- Strong hands-on experience with Java and Spring Boot
- Proven experience building, shipping, and operating APIs in production environments
- Strong problem-solving skills with the ability to debug real production issues end-to-end
- Experience handling P1/P2 incidents in production environments
- Solid understanding of API architecture, request lifecycle, and common failure patterns
- Experience with AWS services, including API Gateway, ALB/NLB, EC2, ECS/EKS, Lambda, RDS, or DynamoDB
- Familiarity with reliability patterns such as timeouts, retries, circuit breakers, and connection pooling
- Experience with observability tools such as Datadog and/or Splunk
- Experience with CI/CD pipelines, preferably Jenkins
- Strong debugging skills in distributed systems
- Experience with Git-based workflows and Agile development
- Bachelor’s in Computer Science, Information Systems, or a related field; equivalent practical experience welcomed.
Benefits
- Medical, dental, vision and life insurance
- Retirement savings – 401(k) plan with generous company matching contributions (up to 6%)
- Tuition reimbursement up to $5,250/year
- Business-casual environment that includes the option to wear jeans
- Generous paid time off upon hire – including a paid time off program plus ten paid company holidays and three floating holidays each calendar year
- Paid volunteer time — 16 hours per calendar year
- Leave of absence programs – including paid parental leave, paid short- and long-term disability, and Family and Medical Leave (FMLA)
- Business Resource Groups (BRGs) – BRGs facilitate inclusion and collaboration across our business internally and throughout the communities where we live, work and play.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
JavaSpring BootAPI developmentAWSDatadogSplunkCI/CDJenkinsdebuggingAPI architecture
Soft Skills
problem-solvingcollaborationcommunicationanalytical thinkingincident managementroot cause analysisoperational readinessreliability improvementsystem designdebugging skills