Cox Enterprises

Senior Site Reliability Engineer – Incident Response

Cox Enterprises

full-time

Posted on:

Location Type: Hybrid

Location: AtlantaCaliforniaPennsylvaniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $99,000 - $165,000 per year

Job Level

Tech Stack

About the role

  • The Site Reliability Engineer - Incident Response is a critical enterprise-level role responsible for accelerating incident resolution and enhancing the overall incident management process
  • This individual partners with engineering teams during active incidents to troubleshoot issues using monitoring and logging tools
  • Post-incident, delivers executive-level summaries that clearly communicate impact, root cause, and resolution
  • Plays a key role in analyzing incident response effectiveness and identifying opportunities for systemic improvements
  • Actively support engineering teams during incidents by helping diagnose and resolve issues quickly
  • Navigate and analyze data from observability platforms to make informed inferences about root causes
  • Analyze the effectiveness of incident response to identify systemic reliability gaps
  • Standardize incident response workflows (incident roles, comms, escalation paths)
  • Create or refine runbooks, incident command frameworks, and severity classification guides

Requirements

  • Skilled in interpreting logs, metrics, and traces to assist in identifying root causes during live incidents
  • Proficient in tools such as Datadog, Splunk, New Relic, or similar platforms
  • Effectively leverages artificial intelligence (AI) and machine learning (ML) tools to automate, optimize, and enhance daily engineering and incident response tasks
  • Ability to distill complex technical issues into concise, business-relevant summaries for senior leadership
  • Strong attention to detail in validating incident data and identifying trends or gaps in response
  • Understanding full-stack systems, CI/CD pipelines, caching, scaling, and cloud-native infrastructure
  • Capable of calculating and interpreting key metrics like MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve)
Benefits
  • The Company offers eligible employees the flexibility to take as much vacation with pay as they deem consistent with their duties, the company’s needs, and its obligations
  • seven paid holidays throughout the calendar year
  • up to 160 hours of paid wellness annually for their own wellness or that of family members
  • additional paid time off in the form of bereavement leave
  • time off to vote
  • jury duty leave
  • volunteer time off
  • military leave
  • parental leave
  • health care insurance (medical, dental, vision)
  • retirement planning (401(k))
  • paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO)
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
incident managementtroubleshootinglog interpretationmetrics analysisroot cause analysisAI toolsmachine learning toolsfull-stack systemsCI/CD pipelinescloud-native infrastructure
Soft Skills
communicationattention to detailanalytical thinkingproblem-solvingcollaboration