
Senior/Staff Virtualization Engineer
fal
full-time
Posted on:
Location Type: Office
Location: San Francisco • California • United States
Visit company websiteExplore more
Salary
💰 $180,000 - $250,000 per year
Job Level
About the role
- Build and deliver custom environments with excellent GPU performance for customer workloads
- Leverage AI to an extreme level to automate provisioning, alerting and recovery
- Provision and configure dedicated Kubernetes clusters tailored to customer requirements
- Design and implement overlay networking (VLAN, VXLAN) and routing configurations (ECMP, BGP) and tunnels (strongSwan, IPSEC) for tenant isolation and performance
- Build and maintain Linux images
- Set up network monitoring and diagnostics for customer environments
- Automate the end-to-end lifecycle of customer compute environments: creation, configuration, validation, and teardown
Requirements
- 5+ years experience with Linux virtualization: KVM/QEMU, libvirt, VFIO device passthrough, hugepages, NUMA, CPU pinning
- Strong networking fundamentals: VXLAN, VLAN, ECMP, BGP, ARP, and the ability to debug packet-level issues (tcpdump, Wireshark)
- Production experience building and operating Kubernetes clusters on bare metal (MetalLB)
- Proficiency with Linux image building and OS provisioning (kickstart, cloud-init, PXE/iPXE)
- Proficiency in Python, Bash, Ansible and Terraform
- Deep experience with NVIDIA GPUs: drivers, MIG, container runtimes (nvidia-container-toolkit), InfiniBand, RDMA/RoCEv2 and GPUDirect for high-performance AI networking
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Benefits
- Health, dental, and vision insurance (US)
- Regular team events and offsites
- Visa sponsorship and relocation assistance
- Learning and growth opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Linux virtualizationKVMQEMUlibvirtVFIO device passthroughhugepagesNUMACPU pinningKubernetesPython
Soft Skills
excellent communicationtechnical decision makingself-starterownershipcontinuous improvement