Loft Labs

AI Infrastructure Specialist

Loft Labs

full-time

Posted on:

Location Type: Remote

Location: CaliforniaMassachusettsUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $150,000 - $200,000 per year

About the role

  • Drive end-to-end technical deployments for GPU neocloud and AI Factory customers, from initial bare metal configuration to a validated vCluster environment.
  • Configure and troubleshoot bare metal GPU node infrastructure, including CNI configuration, GPU Operator setup, distributed storage backends, and RDMA/InfiniBand.
  • Deploy and validate Kubernetes and vCluster to provide GPU-powered managed K8s.
  • Work alongside customer teams to build self-sufficiency, ensuring they can operate and grow the platform independently.
  • Document reusable playbooks and deployment architectures so your learnings become the next customer's head start.
  • Collaborate with Engineering and Product to surface recurring infrastructure challenges, acting as a direct feedback loop from the field into the roadmap.
  • Join Sales in the pre-sales process where deep infrastructure work is required to achieve a meaningful proof of value.

Requirements

  • 5+ years of experience deploying and operating Kubernetes in production, ideally on bare metal or in high-complexity environments.
  • Practical knowledge of NVIDIA GPU Operators, CUDA tooling, and systems-level configuration for GPU nodes.
  • Deep understanding of CNI plugins, overlay networks, load balancing, and connectivity diagnosis in layered environments.
  • Experience with persistent volume configuration, CSI drivers, and distributed systems like Ceph, Rook, Weka, or Longhorn.
  • Comfort operating in ambiguous, fast-moving environments where you are often writing the playbook in real time.
  • You thrive in environments that reject legacy tech and prefer a modern stack where you can solve a variety of problems from pipelines to internal services.
Benefits
  • Competitive Salary: We offer a competitive compensation package, including equity.
  • Platinum-Level Insurance: Health, dental, vision, and life Insurance, including plans for you and eligible dependents (benefits vary depending on country).
  • Flexible Working Schedule: You have a doctor’s appointment or need to head to the supermarket to get groceries at 2pm? We won’t have an issue with that. To us, results matter more than clocking in and out at the same time every day.
  • Workplace Flexibility: We’re very flexible about where you work. We know things can change in life and we’re happy to adjust the work environment for you along the way.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesGPU OperatorsCUDACNI pluginspersistent volume configurationCSI driversdistributed systemsCephRookWeka
Soft Skills
collaborationproblem-solvingdocumentationcustomer engagementself-sufficiencyadaptabilitycommunicationfeedbackindependenceinitiative