FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Infrastructure Engineer, Storage
Lightning AIInfrastructure Engineer at Lightning AI managing storage systems for high-throughput AI/ML workloads. Focusing on building and operating distributed storage systems with reliability and efficiency.
Posted 6/24/2026full-timeNew York City • California, New York, Washington • 🇺🇸 United StatesMid-LevelSenior💰 $180,000 - $200,000 per yearWebsite
Tech Stack
Tools & technologiesLinuxNFSPython
About the role
Key responsibilities & impact- Operate and scale distributed storage systems, including VAST and S3-compatible object storage (e.g., Ceph)
- Improve performance, reliability, and efficiency of storage systems supporting large-scale AI/ML workloads
- Troubleshoot complex storage and data path issues across hardware and software layers
- Optimize storage performance to support high-throughput, low-latency AI training and inference workloads
- Build and maintain automation for provisioning, managing, and monitoring storage infrastructure
- Develop Python-based tools and workflows to reduce manual operational overhead
- Improve lifecycle management of storage clusters, from deployment through maintenance and scaling
- Manage and operate Linux-based systems in production, including bare-metal environments
- Partner with infrastructure and data center teams on hardware bring-up, upgrades, and issue resolution
- Support capacity planning, utilization tracking, and forecasting for storage systems
- Leverage monitoring and telemetry to diagnose issues and improve system performance and reliability
- Work closely with Infrastructure Engineering, Network Engineering, and Platform teams to integrate storage into the broader platform
- Contribute to design discussions around new infrastructure deployments and scaling strategies
- Help define best practices for operating storage systems in high-performance computing environments
Requirements
What you’ll need- 5+ years of experience in infrastructure engineering, systems engineering, or related roles
- Hands-on experience operating distributed storage systems (e.g., VAST, Ceph, or similar)
- Strong Linux systems experience in production environments
- Proficiency in Python or similar scripting/programming languages for automation
- Experience working with bare-metal infrastructure and hardware-oriented systems
- Ability to debug complex issues across system boundaries (storage, OS, hardware, networking)
- Experience with storage networking protocols (e.g., NFS or similar)
- Experience with capacity planning, monitoring, and performance tuning
Benefits
Comp & perks- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
- Generous paid time off, plus holidays
- Paid parental leave
- Professional development support
- Wellness and work-from-home stipends
- Flexible work environment
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
distributed storage systemsVASTS3-compatible object storageCephPythonLinuxstorage networking protocolscapacity planningperformance tuningautomation
Soft Skills
troubleshootingproblem-solvingcollaborationcommunicationdesign discussionsbest practices definition