Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
TensorWave

Staff Infrastructure Engineer – Storage Platform

TensorWave

Storage Platform Staff Infrastructure Engineer responsible for design, operation, and evolution of storage systems. Collaborating with cross-functional partners to support business objectives.

Posted 6/10/2026full-timeLas Vegas • Nevada • 🇺🇸 United StatesLeadWebsite

Tech Stack

Tools & technologies
AnsibleDistributed SystemsGrafanaKubernetesLinuxPrometheusTerraform

About the role

Key responsibilities & impact
  • Design and evolve storage architectures supporting Kubernetes (block, file, object storage), AI/ML and high-performance compute workloads
  • Evaluate and select storage technologies based on performance (IOPS, throughput, latency), scalability and fault tolerance, operational complexity and maintainability
  • Define storage standards, best practices, and reference architectures
  • Design for resilience over traditional HA, including failure-domain awareness
  • Own production storage platforms, including Ceph (RBD, CephFS, RGW), High-performance NAS (Weka, VAST, or similar)
  • Lead lifecycle operations - Cluster design and deployment, expansion and scaling, upgrades and migrations
  • Perform and guide capacity planning, performance tuning, failure analysis
  • Analyze storage performance across IOPS, throughput, latency, and tail latency
  • Identify and resolve bottlenecks across disk subsystems, network paths (including RDMA), client access patterns
  • Lead root cause analysis for storage-related incidents
  • Ensure storage platforms meet the demands of GPU and Kubernetes workloads
  • Define and implement Kubernetes storage patterns - CSI drivers, StorageClasses, persistent storage design
  • Troubleshoot complex Kubernetes storage issues involving stateful workloads, provisioning failures, performance anomalies
  • Partner with platform teams to align storage with workload requirements
  • Design and implement automation for storage deployment and configuration, cluster lifecycle management
  • Leverage tools such as Ansible, Terraform, Kubernetes manifests / Helm
  • Integrate storage platforms into observability stacks (Prometheus, Grafana, etc.)
  • Serve as the technical authority for storage across the organization
  • Mentor engineers on storage systems, performance, and troubleshooting
  • Establish operational standards and best practices
  • Drive continuous improvement of storage reliability and performance

Requirements

What you’ll need
  • 7+ years of experience in infrastructure, storage, or distributed systems
  • Deep hands-on experience with distributed storage systems in production
  • Strong experience with Ceph (RBD, CephFS, and/or RGW)
  • Strong Linux systems expertise
  • Experience with high-performance storage platforms such as: Weka, VAST Data, or similar
  • Strong understanding of: Storage performance characteristics
  • Data replication and failure domains
  • Distributed system design principles
  • Ability to troubleshoot across: Storage, network, and compute layers
  • Experience supporting AI/ML or HPC workloads
  • Familiarity with: NVMe-based architectures
  • RDMA or high-throughput Ethernet
  • Experience integrating storage with Kubernetes at scale
  • Experience operating across multiple data centers
  • Exposure to object storage and S3-compatible APIs

Benefits

Comp & perks
  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance for Employees
  • Company Health Savings Account Contributions
  • 100% paid Short Term and Long Term Disability Insurance for Employees
  • Life and Voluntary Supplemental Insurance Options
  • Other Insurance Options, such as Pet & Legal Insurance
  • Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
  • Flexible Spending Account
  • 401(k)
  • Employee Assistance Program
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Other In-Office Perks

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesCephLinuxdistributed storage systemshigh-performance storagestorage performance characteristicsdata replicationtroubleshootingAI/ML workloadsHPC workloads
Soft Skills
leadershipmentoringproblem-solvingcommunicationcollaborationcontinuous improvementoperational standards