
Network Site Reliability Engineer
NVIDIA
full-time
Posted on:
Location Type: Hybrid
Location: Reading • 🇬🇧 United Kingdom
Visit company websiteJob Level
SeniorLead
Tech Stack
AnsibleFirewallsGoGrafanaLinuxPrometheusPythonSaltStackServiceNowSwitching
About the role
- Owning the operational aspect of the network infrastructure, ensuring its high availability and reliability.
- Partnering with architecture and deployment teams to guarantee that new implementations are supportable and align with production standards.
- Advocating for and implementing automation to reduce toil and enhance operational efficiency.
- Monitoring network performance, identifying areas for improvement, and coordinating with relevant teams to execute enhancements.
- Collaborating with SMEs to resolve production issues swiftly and effectively, maintaining customer satisfaction.
- Identifying opportunities for operational improvements and partnering with teams to develop solutions that drive excellence and sustainability in network operations.
Requirements
- BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
- Minimum of 8 years of industry experience in network site reliability engineering, network automation, network operations, or related areas.
- Experience on both campus and data center networks.
- Familiarity with network management tools such as Prometheus, Grafana, Alert Manager, Nautobot/Netbox, BigPanda.
- Expertise in automating networks using frameworks such as Salt, Ansible, or similar.
- In depth experience in one or more of the following: Python, Go.
- Knowledge in network technologies such as TCP/UDP, IPv4/IPv6, Wireless, BGP, VPN, L2 switching, , Firewalls, Load Balancers, EVPN, VxLAN, Segment Routing.
- Proven track record in network operations.
- Skills with ServiceNow and Jira.
- Knowledge of Linux system fundamentals is a plus.
- Systematic problem-solving approach, coupled with excellent communication skills and a sense of ownership and drive.
Benefits
- Professional development opportunities
- Flexible working hours
- Remote work options
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
network site reliability engineeringnetwork automationnetwork operationsPythonGoTCPUDPBGPVPNLinux
Soft skills
problem-solvingcommunicationownershipdrivecollaborationcustomer satisfactionoperational efficiencyimprovement identificationteam coordinationadvocacy