Avaya

Site Reliability Engineer, Azure – DevSecOps – IaC – Governance – Observability

Avaya

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $129,000 - $143,000 per year

About the role

  • Serve as a key member of the 24×7 on-call rotation, responding to and managing incidents across production and pre-production environments.
  • Lead incident bridges, coordinate root cause analysis (RCA), and ensure post-incident reviews drive systemic improvements.
  • Maintain clear communication with cross-functional teams and leadership during major incidents.
  • Build, tune, and maintain observability dashboards (Azure Monitor, GCP Operations Suite, Prometheus, Grafana, Datadog, Log Analytics).
  • Perform deep-dive troubleshooting of application and service-level issues using distributed tracing and log analysis (Grafana, Datadog) to pinpoint root causes beyond infrastructure.
  • Define SLOs, SLIs, and error budgets to proactively identify and mitigate reliability risks before customer impact.
  • Integrate AI-Ops tools for anomaly detection, predictive alerting, and automated incident correlation.
  • Continuously enhance alert quality, reduce false positives, and automate runbooks for faster recovery.

Requirements

  • 5+ years in Site Reliability, DevOps, Cloud Operations, or Customer support roles.
  • Demonstrated experience in application-level troubleshooting by analyzing logs and traces to identify bugs, performance bottlenecks, and error conditions.
  • Expertise in Azure and GCP cloud operations and distributed system reliability.
  • Understanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions).
  • Experience with observability and AI-Ops tools (Azure Monitor, GCP Operations Suite, Grafana, Prometheus, Datadog, etc.).
  • Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations).
  • Excellent analytical, troubleshooting, and communication skills.
Benefits
  • performance-related bonus
  • benefits
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
incident managementapplication-level troubleshootingdistributed tracinglog analysisSLOsSLIserror budgetsTerraformAnsibleCI/CD pipelines
Soft Skills
analytical skillstroubleshooting skillscommunication skills