NVIDIA

Network Site Reliability Engineer

NVIDIA

full-time

Posted on:

Location Type: Hybrid

Location: Reading • 🇬🇧 United Kingdom

Visit company website
AI Apply
Apply

Job Level

SeniorLead

Tech Stack

AnsibleFirewallsGoGrafanaLinuxPrometheusPythonSaltStackServiceNowSwitching

About the role

  • Owning the operational aspect of the network infrastructure, ensuring its high availability and reliability.
  • Partnering with architecture and deployment teams to guarantee that new implementations are supportable and align with production standards.
  • Advocating for and implementing automation to reduce toil and enhance operational efficiency.
  • Monitoring network performance, identifying areas for improvement, and coordinating with relevant teams to execute enhancements.
  • Collaborating with SMEs to resolve production issues swiftly and effectively, maintaining customer satisfaction.
  • Identifying opportunities for operational improvements and partnering with teams to develop solutions that drive excellence and sustainability in network operations.

Requirements

  • BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
  • Minimum of 8 years of industry experience in network site reliability engineering, network automation, network operations, or related areas.
  • Experience on both campus and data center networks.
  • Familiarity with network management tools such as Prometheus, Grafana, Alert Manager, Nautobot/Netbox, BigPanda.
  • Expertise in automating networks using frameworks such as Salt, Ansible, or similar.
  • In depth experience in one or more of the following: Python, Go.
  • Knowledge in network technologies such as TCP/UDP, IPv4/IPv6, Wireless, BGP, VPN, L2 switching, , Firewalls, Load Balancers, EVPN, VxLAN, Segment Routing.
  • Proven track record in network operations.
  • Skills with ServiceNow and Jira.
  • Knowledge of Linux system fundamentals is a plus.
  • Systematic problem-solving approach, coupled with excellent communication skills and a sense of ownership and drive.
Benefits
  • Professional development opportunities
  • Flexible working hours
  • Remote work options

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
network site reliability engineeringnetwork automationnetwork operationsPythonGoTCPUDPBGPVPNLinux
Soft skills
problem-solvingcommunicationownershipdrivecollaborationcustomer satisfactionoperational efficiencyimprovement identificationteam coordinationadvocacy