Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Lightning AI

Infrastructure Operations Engineer

Lightning AI

Infrastructure Operations Engineer managing next-generation AI infrastructure across GPU systems at Lightning AI. Collaborating with teams to improve operational efficiency and minimize incidents.

Posted 6/24/2026full-timeRemote • California, New York, Washington • 🇺🇸 United StatesSeniorLead💰 $160,000 - $200,000 per yearWebsite

Tech Stack

Tools & technologies
AnsibleAWSGoKubernetesLinuxNFSPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features.
  • Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases.
  • Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams.
  • Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position.

Requirements

What you’ll need
  • 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience.
  • 5+ years experience with AWS.
  • 2+ years experience with Kubernetes and strong container fundamentals.
  • 2+ years experience with Terraform and Ansible
  • 2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems.
  • Experience with monitoring systems (Prometheus, ELK stack).
  • Familiarity with the gitops workflow.
  • Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together.
  • Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband.
  • Experience building and delivering complex systems.
  • Effective at navigating tradeoffs between design, risk, cost, and outcomes.
  • Comfortable with navigating ambiguity.
  • Strong written and oral communication.

Benefits

Comp & perks
  • Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional development support
  • Wellness and work-from-home stipends
  • Flexible work environment

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LinuxUbuntuAWSKubernetesTerraformAnsibleNFScephPythonGo
Soft Skills
communicationcollaborationproblem-solvingnavigating ambiguitytradeoff analysis