Senior Site Reliability Engineer

skillventory - A Leading Talent Research Firm

Senior Site Reliability Engineer for Fidelity's R4 Responsive OpsWorX Team managing production incidents. Collaborating with business and product teams to ensure application stability and performance.

Posted 5/1/2026full-timeWestlake • New Hampshire, Texas • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies

CloudGrafanaJavaPrometheusSplunk

About the role

Key responsibilities & impact

Respond to production incidents
Collaborate with business partners responding to application specific questions
Work with product teams to promote availability, resilience, and stability
Proactively identify performance bottlenecks, capacity risks, and failure points; recommend and implement remediation strategies
Instrument applications and infrastructure to provide end-to-end visibility into system health, performance, and reliability
Lead incident response , providing rapid triage and resolution during production outages or performance degradation
Collaborate closely with development, infrastructure, security, and business teams to align operational and business objectives

Requirements

What you’ll need

Bachelor’s degree or higher in a technology related field (like Engineering, Computer Science, Information Technology) required
Minimum 5 years of combined experience across Production Support, Application Development (Java), and Site Reliability Engineering (SRE) to ensure system stability, scalability, and performance
3 years of hands-on experience with Amazon EKS and RDS
Lead and execute cloud migration initiatives , ensuring minimal downtime, performance optimization, and adherence to architectural best practices
Implement and maintain CI/CD pipelines to enable reliable, automated, and secure application deployments
Design, implement, and continuously improve observability solutions , including: Monitoring Logging Alerting Distributed tracing using tools such as Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, and Splunk
Conduct root cause analysis (RCA) for critical incidents and drive corrective and preventive actions

Benefits

Comp & perks

Health insurance
Retirement plans
Paid time off
Flexible work arrangements
Professional development

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Production SupportApplication DevelopmentSite Reliability EngineeringJavaAmazon EKSAmazon RDSCI/CD pipelinesObservability solutionsRoot cause analysisPerformance optimization

Soft Skills

CollaborationLeadershipProblem-solvingCommunicationProactive identification