Salary
💰 $248,000 - $391,000 per year
Tech Stack
CloudDistributed SystemsDNSGoLinuxMicroservicesPythonTerraform
About the role
- Lead initiatives to transform IT Compute Core Team, architecture to build new service offerings across On-Prem and Cloud
- You will design, scale, and deploy core infrastructure services including DNS, NTP/PTP, DHCP, and LDAP
- This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity planning, and lifecycle management
- Define and implement metrics to measure the efficiency of services and drive efficiency with software and hardware optimizations (SR-IOV/ DPU)
- Experience with Technologies like eBPF and XDP for Observability & DDoS mitigation
- Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans for appropriate level enterprise-wide systems, and coordinate with management personnel in implementing changes
- Develop and maintain tools for collecting, analyzing, and visualizing data for reporting, alerting, monitoring
- Collaborate with NVIDIA leadership, senior engineers, program managers, and product managers to develop compelling IT products and services that meet customer needs
Requirements
- Bachelor's degree in Engineering, Computer Science, Mathematics, or related field, or equivalent experience
- 15+ years of proven experience in compute platform engineering with a focus on automation
- Experience in designing and deploying Containerization architectures and Distributed Systems Infrastructure
- Proven experience evaluating existing application architectures and identify opportunities for containerization to improve scalability, reliability, and efficiency
- Strong analytical skills with the ability to define and track key performance metrics
- Experience in developing tools for data analysis and performance profiling, Development with Terraform, Config Management tools
- Proficiency in programming languages such as Go and/or Python
- Linux OS Proficiency with Kernel Internals
- Experience with running large environments consisting of BareMetal Build Infrastructure
- Understanding of Network Protocols and Architectures (VLAN/VxLAN/SDN/BGP/Anycast)
- Hands-on experience with containers and its implementation
- Deploying and Managing Services like DNS , LDAP at scale
- Solid understanding of microservices architecture, infrastructure as code (IaC) and configuration management tools