FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
skillventory - A Leading Talent Research FirmSenior Site Reliability Engineer for Fidelity's R4 Responsive OpsWorX Team managing production incidents. Collaborating with business and product teams to ensure application stability and performance.
Tech Stack
Tools & technologiesCloudGrafanaJavaPrometheusSplunk
About the role
Key responsibilities & impact- Respond to production incidents
- Collaborate with business partners responding to application specific questions
- Work with product teams to promote availability, resilience, and stability
- Proactively identify performance bottlenecks, capacity risks, and failure points; recommend and implement remediation strategies
- Instrument applications and infrastructure to provide end-to-end visibility into system health, performance, and reliability
- Lead incident response , providing rapid triage and resolution during production outages or performance degradation
- Collaborate closely with development, infrastructure, security, and business teams to align operational and business objectives
Requirements
What you’ll need- Bachelor’s degree or higher in a technology related field (like Engineering, Computer Science, Information Technology) required
- Minimum 5 years of combined experience across Production Support, Application Development (Java), and Site Reliability Engineering (SRE) to ensure system stability, scalability, and performance
- 3 years of hands-on experience with Amazon EKS and RDS
- Lead and execute cloud migration initiatives , ensuring minimal downtime, performance optimization, and adherence to architectural best practices
- Implement and maintain CI/CD pipelines to enable reliable, automated, and secure application deployments
- Design, implement, and continuously improve observability solutions , including: Monitoring Logging Alerting Distributed tracing using tools such as Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, and Splunk
- Conduct root cause analysis (RCA) for critical incidents and drive corrective and preventive actions
Benefits
Comp & perks- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Production SupportApplication DevelopmentSite Reliability EngineeringJavaAmazon EKSAmazon RDSCI/CD pipelinesObservability solutionsRoot cause analysisPerformance optimization
Soft Skills
CollaborationLeadershipProblem-solvingCommunicationProactive identification