
Senior Software Engineer – NIM Factory Infrastructure
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • New York • United States
Visit company websiteExplore more
Salary
💰 $148,000 - $287,500 per year
Job Level
Tech Stack
About the role
- Develop, analyze and optimize factory infrastructure that will take an AI model in and produce a deployable service that is validated across Cloud, On-prem and Kubernetes environments.
- With the team, define and deliver rapid iterations on the group's technical strategies and roadmaps to deliver and improve the NIM factory.
- You will be developing harness, automating hardware acceptance, analyze benchmarks, data gathering and statistical analysis of systems health and performance analysis of NIMs
- Work with technical leaders designing and developing scalable and reliable factory acceptance and performance tuning of hardware platforms.
- You will collaborate with multiple AI model teams to understand their requirements to build an efficient infrastructure that improves every team's productivity.
- Define metrics and drive improvements based on user feedback.
- You will mentor and collaborate throughout the team and with other teams to grow your colleagues and yourself.
Requirements
- A history of using your advanced programming skills to build tooling and automation for hardware system characterization and benchmarking.
- Proven experience debugging and analyzing performance of compute applications and system
- Deep technical expertise working with system software and platform layers including Kernel, device driver, memory, storage, networking and PCIe devices
- Experience working with hardware clusters, distributed system, networking, GPU interconnects (PCie, NVlink), node and cluster interconnect (InfiniBand)
- Passion for building platform engineering components and automation of system benchmarking and characterization.
- Excellent interpersonal skills and the ability to lead multi-functional efforts.
- BS or MS in Computer Science, Computer Engineering or related field (or equivalent experience)
- 5+ years of shown experience developing performant microservice, cloud software and/or tooling roles.
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
programmingautomationbenchmarkingperformance analysissystem softwareplatform layershardware characterizationmicroservicescloud softwareKubernetes
Soft skills
interpersonal skillsleadershipcollaborationmentoringcommunication