Netomi

Incident Engineer

Netomi

full-time

Posted on:

Location Type: Remote

Location: India

Visit company website

Explore more

AI Apply
Apply

About the role

  • Own the incident lifecycle: detection, triage, escalation, resolution, and postmortems
  • Act as the central command during major incidents (war rooms, stakeholder updates)
  • Define and enforce SLAs/SLOs, incident severity frameworks, and runbooks
  • Collaborate with Engineering, ML, and Integrations teams to resolve issues quickly
  • Monitor system health across integrations (agent desks, LLMs, ASR/TTS pipelines)
  • Drive root cause analysis (RCA) and preventive actions
  • Improve observability, alerting, and incident tooling
  • Maintain clear internal and customer-facing communication during incidents

Requirements

  • 3–6 years in Incident Management / SRE / Production Support roles
  • Strong understanding of distributed systems, APIs, and cloud environments (AWS)
  • Experience with observability tools (e.g., DataDog)
  • Familiarity with AI/ML systems, especially LLM integrations and voice stacks (ASR/TTS), is a plus
  • Experience with monitoring/tracing tools like Langfuse or similar
  • Excellent communication and stakeholder management skills
  • Ability to stay calm under pressure and drive structured resolution
Benefits
  • Equal opportunity employer committed to diversity
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
incident managementsite reliability engineeringproduction supportdistributed systemsAPIscloud environmentsobservability toolsmonitoring toolsroot cause analysisincident tooling
Soft Skills
communicationstakeholder managementcalm under pressurestructured resolution