FluidStack

Senior / Staff Infrastructure Engineer, Compute

FluidStack

full-time

Posted on:

Origin:  • 🇺🇸 United States • New York

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

AnsibleCloudKubernetesLinuxTerraform

About the role

  • Design, deploy, and manage the compute infrastructure powering Fluidstack's GPU clusters
  • Design and implement GPU/ASIC infrastructure at the server, rack, and system level
  • Troubleshoot complex GPU and compute system related failures
  • Develop and maintain hardware/firmware management services
  • Automate all aspects of the server lifecycle
  • Own end-to-end compute lifecycle, including partnering with vendors on RMAs
  • Serve as the main point of contact for hardware escalation and troubleshooting
  • Monitor system performance, identifying and resolving bottlenecks
  • Automate deployment and management tasks to improve efficiency
  • Collaborate with storage and network teams to ensure cohesive infrastructure operations
  • Work closely with hardware and software teams to support AI workloads

Requirements

  • 5+ years of experience in compute infrastructure engineering
  • Strong knowledge of Linux systems administration and performance tuning
  • Experience with bare metal provisioning tools (MaaS, Metal3, Tinkerbell, or other)
  • Familiarity with GPU hardware and workload optimization, especially kernel and driver level requirements
  • Proficiency in automation tools (e.g., Ansible, Terraform)
  • Experience operating Kubernetes and SLURM clusters