FluidStack

Senior Data Center Operations Engineer

FluidStack

full-time

Posted on:

Origin:  • 🇺🇸 United States • New York

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

Cloud

About the role

  • Own regional data center operations end-to-end. Manage power, cooling, and rack infrastructure across multiple sites.
  • Develop high-efficiency rack designs. Maximize space and power utilization to save millions in energy and infrastructure costs.
  • Build automation tools that eliminate routine tasks. Every manual process is an opportunity to code a solution.
  • Configure and manage global PDU infrastructure. Integrate with monitoring systems to calculate PUE, generate alerts, and create reports.
  • Lead DCIM implementation and adoption. Track assets across multiple locations, reduce asset retrieval time, and maintain 99.99%+ accuracy.
  • Drive automation initiatives. Integrate tracking systems with DCIM for real-time asset lifecycle updates.
  • Design reports and dashboards. Identify rack and power density improvements to drive efficiency and capacity optimization.
  • Maintain policy and procedure documents. Update SOPs and MOPs for compliance and efficiency.
  • Utilize ticketing and knowledge base applications. Leverage Jira, and Confluence to manage workflow and documentation.
  • Document everything. Write clear procedures that enable others to execute flawlessly.

Requirements

  • 5+ years managing data center operations at scale. Experience with hardware integration, capacity planning, and infrastructure optimization.
  • Proven track record achieving 99.99%+ accuracy in physical audits across multiple regions and countries.
  • Experience managing DCIM implementations. You've tracked 100K+ assets and reduced retrieval times by 75%.
  • Strong vendor management skills. Experience with ITAD relationships, hardware disposals, and generating revenue from e-waste.
  • Expertise in power infrastructure. Knowledge of PDU configuration, PUE calculations, and energy optimization.
  • Experience leading large-scale migrations. You've executed 10+ full-cage relocations maintaining continuous service.
  • Automation mindset. You've integrated RFID tracking with DCIM and improved data accuracy from 80% to 99.999%.
  • Excellent vendor management skills. You negotiate effectively and hold partners accountable.
  • Strong technical documentation skills. Experience creating and maintaining SOPs, MOPs, and training materials.
  • Data-driven approach. You create dashboards and reports that drive infrastructure decisions.
  • Extreme ownership mentality. You see problems through from identification to resolution.
  • Experience with GPU infrastructure and high-performance computing environments (Nice to have).
  • Familiarity with AI/ML workloads and their infrastructure requirements (Nice to have).
  • Knowledge of liquid cooling systems for high-density compute (Nice to have).
  • Experience building custom monitoring and automation tools (Nice to have).
  • Background in hyperscale or cloud data center operations (Nice to have).