Zayo Group

Manager, Network Observability

Zayo Group

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Salary

💰 $97,900 - $150,600 per year

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSAzureCloudFirewallsGoogle Cloud PlatformGrafanaPrometheusPythonServiceNowTCP/IP

About the role

  • Lead and Mentor: Build, lead, and inspire a high-performing team of network observability engineers and specialists, fostering a culture of continuous learning, ownership, and technical excellence.
  • Strategy & Roadmap: Define and execute the strategic roadmap for network observability, aligning with overall network engineering and business objectives. This includes evolving our monitoring, alerting, logging, and tracing capabilities.
  • Platform Ownership: Serve as the primary owner network observability, optimizing its configuration, performance, and integrations to ensure it provides a unified and real-time view of network health. Drive feature adoption and define future enhancements.
  • Metrics & Alerts: Establish, refine, and enforce comprehensive network health metrics (SLIs, SLOs, KPIs) and develop intelligent, actionable alerting strategies to minimize noise and improve incident response.
  • Incident Management & Post-Mortems: Collaborate closely with NOC and SRE teams to improve incident detection, triage, and resolution processes. Drive blameless post-mortems and ensure lessons learned are translated into system and process improvements.
  • Automation: Champion and implement automation initiatives within observability, leveraging tools and scripting (e.g., Python, Ansible) to automate new service, data collection, analysis, reporting, and remediation workflows.
  • Cross-Functional Collaboration: Partner effectively with Network Engineering, Software Engineering, NOC, ISP/OSP, Security, and Product teams to understand their observability needs, provide necessary insights, and ensure seamless integration of monitoring solutions.
  • Tooling & Ecosystem: Evaluate, select, and integrate supplementary observability tools and technologies as needed to complement Assure1 and enhance our overall network visibility.
  • Reporting & Insights: Develop and deliver insightful reports and dashboards that provide clear visibility into network performance, reliability, and trends for various stakeholders, from operations to executive leadership.
  • Vendor Management: Manage relationships with key observability vendors, including Assure1, to ensure optimal licensing, support, and feature development.

Requirements

  • Bachelor's degree in Computer Science, Electrical Engineering, or a related technical field; or equivalent practical experience.
  • Minimum of five (5) years of experience in network operations or site reliability engineering
  • Minimum of two (2) years in a leadership, management, or product owner role.
  • Experience with network monitoring and event management platforms, including hands on experience
  • Strong understanding of networking protocols (TCP/IP, BGP, OSPF), network services, and common network devices (routers, switches, firewalls, load balancers).
  • Proven ability to define, implement, and optimize network health metrics (SLIs, SLOs) and alerting strategies.
  • Experience with scripting and automation (e.g., Python or Ansible)
  • Excellent analytical and problem-solving skills, with a track record of driving root cause analysis on complex issues.
  • Strong communication and interpersonal skills, with the ability to articulate technical concepts clearly to both technical and non-technical audiences.
  • Experience with the overall incident management processes and tools, including ServiceNow and troubleshooting tools