
Senior Site Reliability Engineer – Incident Response
Cox Enterprises
full-time
Posted on:
Location Type: Hybrid
Location: Atlanta • California • Pennsylvania • United States
Visit company websiteExplore more
Salary
💰 $99,000 - $165,000 per year
Job Level
About the role
- The Site Reliability Engineer - Incident Response is a critical enterprise-level role responsible for accelerating incident resolution and enhancing the overall incident management process
- This individual partners with engineering teams during active incidents to troubleshoot issues using monitoring and logging tools
- Post-incident, delivers executive-level summaries that clearly communicate impact, root cause, and resolution
- Plays a key role in analyzing incident response effectiveness and identifying opportunities for systemic improvements
- Actively support engineering teams during incidents by helping diagnose and resolve issues quickly
- Navigate and analyze data from observability platforms to make informed inferences about root causes
- Analyze the effectiveness of incident response to identify systemic reliability gaps
- Standardize incident response workflows (incident roles, comms, escalation paths)
- Create or refine runbooks, incident command frameworks, and severity classification guides
Requirements
- Skilled in interpreting logs, metrics, and traces to assist in identifying root causes during live incidents
- Proficient in tools such as Datadog, Splunk, New Relic, or similar platforms
- Effectively leverages artificial intelligence (AI) and machine learning (ML) tools to automate, optimize, and enhance daily engineering and incident response tasks
- Ability to distill complex technical issues into concise, business-relevant summaries for senior leadership
- Strong attention to detail in validating incident data and identifying trends or gaps in response
- Understanding full-stack systems, CI/CD pipelines, caching, scaling, and cloud-native infrastructure
- Capable of calculating and interpreting key metrics like MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve)
Benefits
- The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company’s needs, and its obligations
- seven paid holidays throughout the calendar year
- up to 160 hours of paid wellness annually for their own wellness or that of family members
- additional paid time off in the form of bereavement leave
- time off to vote
- jury duty leave
- volunteer time off
- military leave
- parental leave
- health care insurance (medical, dental, vision)
- retirement planning (401(k))
- paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
incident managementtroubleshootinglog interpretationmetrics analysisroot cause analysisAI toolsmachine learning toolsfull-stack systemsCI/CD pipelinescloud-native infrastructure
Soft Skills
communicationattention to detailanalytical thinkingproblem-solvingcollaboration