
Staff AI/ML Infrastructure Engineer
Vultr
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $145,000 - $160,000 per year
Job Level
About the role
- Design and maintain GPU and bare metal infrastructure in containerized and physical environments
- Build scalable GPU clusters in partnership with networking and provisioning teams
- Ensure reliable, high-performance provisioning of GPU infrastructure
- Develop automated testing systems for GPU-based platforms
- Implement infrastructure solutions for diverse AI/ML workloads
- Benchmark, test, and troubleshoot GPU performance at scale
- Collaborate with hardware vendors on drivers, firmware, and support
- Resolve hardware, software, and performance issues across environments
- Optimize rail and cluster performance across architectures
- Lead technical direction and mentor engineers on infrastructure best practices
Requirements
- 5+ years experience working with bare metal infrastructure and hardware automation
- Hands-on experience with modern NVIDIA/AMD GPU platforms and high-performance networking (RoCE, InfiniBand)
- Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems
- Strong Linux systems experience including device drivers and package management
- Experience building infrastructure automation using Python and Bash
- Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration
- Experience designing and delivering complex infrastructure products
- Proven ability to lead projects and mentor engineers
- Experience optimizing multi-cluster GPU environments
- Exposure to Machine Learning software stacks and GPU workloads
Benefits
- 100% company-paid insurance premiums for employee medical, dental and vision plans.
- 401(k) plan that matches 100% up to 4%, with immediate vesting
- Professional Development Reimbursement of $2,500 each year
- 11 Holidays + Paid Time Off Accrual + Rollover Plan
- Commitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
- $500 stipend for remote office setup in first year + $400 each following year
- Internet reimbursement up to $75 per month
- Gym membership reimbursement up to $50 per month
- Company paid Wellable subscription
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU infrastructurebare metal infrastructurehardware automationNVIDIA GPUAMD GPUhigh-performance networkingLinux systemsPythonBashMachine Learning
Soft Skills
leadershipmentoringcollaborationproblem-solvingproject management