Hydra Host

Storage Engineer

Hydra Host

full-time

Posted on:

Location Type: Office

Location: Miami • Florida • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $200,000 - $300,000 per year

Job Level

SeniorLead

Tech Stack

CloudLinux

About the role

  • Define, architect, and implement Hydra Host’s first production storage platform tailored for bare-metal GPU clusters and AI/HPC workloads.
  • Lead all technical decisions around storage stack design, from hardware infrastructure to parallel file system orchestration and performance tuning.
  • Select, build, and maintain storage solutions spanning both block (NVMe, SAN, Ceph, etc.) and object storage (S3-compatible, custom, or Ceph Object Gateway) layers.
  • Design for high-throughput, low-latency access, supporting large datasets, rapid checkpointing, and parallel access for distributed AI training workloads.
  • Integrate and optimize parallel file systems such as Lustre, BeeGFS, Spectrum Scale, WekaIO, or CephFS, ensuring maximum performance and fault tolerance.
  • Ensure compatibility across Hydra’s diverse GPU/OEM ecosystem, accounting for unique firmware, BMC/Redfish APIs, and hardware configurations.
  • Develop automation, observability, and management tooling for storage, focusing on reliability, scalability, and efficiency.
  • Act as a builder and architect: deeply hands-on in deployment, troubleshooting, and optimization, while guiding long-term storage roadmap.
  • Collaborate cross-functionally with GPU, HPC, and platform engineering teams to integrate storage with compute and network layers.
  • Interface with customers and product leadership to define feature priorities, performance benchmarks, and future enhancements.

Requirements

  • 8+ years of progressive, hands-on experience designing and implementing high-performance storage systems for compute clusters in HPC, AI, or bare-metal cloud environments.
  • Proven track record building storage infrastructure from scratch, not just operating existing systems.
  • Deep expertise in block storage (NVMe, SAN, Ceph, distributed block systems) and object storage (S3, MinIO, Ceph Object Gateway, etc.).
  • Strong background in parallel file systems (WekaIO, BeeGFS, Lustre, Spectrum Scale, or similar) supporting GPU or AI cluster workloads.
  • Solid foundation in Linux systems engineering, automation, and scripting for distributed environments.
  • Familiarity with BMC, Redfish APIs, and OEM server firmware for bare-metal management.
  • Deep understanding of AI/ML data pipelines: model checkpointing, data locality, and multi-tiered storage optimization.
  • Excellent problem-solving, debugging, and communication skills, able to translate technical decisions into clear architectural direction.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
storage systems designhigh-performance storageblock storageobject storageparallel file systemsLinux systems engineeringautomationscriptingAI/ML data pipelinesperformance tuning
Soft skills
problem-solvingdebuggingcommunicationcollaborationleadership