Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
TensorWave

Software Engineer

TensorWave

Senior Software Engineer at TensorWave automating GPU cluster provisioning and operations. Collaborating cross-functionally to support business objectives and uphold operational excellence standards.

Posted 6/10/2026full-timeLas Vegas • Nevada • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
GoGrafanaGRPCKubernetesNode.jsPrometheus

About the role

Key responsibilities & impact
  • Build and maintain fully automated pipelines for provisioning bare metal GPU clusters from zero to production
  • Automate Slurm and Kubernetes cluster lifecycle—bootstrapping, upgrades, node provisioning, and decommissioning at scale
  • Develop and maintain infrastructure for GPU node configuration, including drivers and firmware
  • Own cluster validation pipelines, automating health checks and GPU burn-in tests
  • Build day-2 operations automation, including node remediation, rolling upgrades, and automated drain/cordon workflows
  • Write and maintain runbooks and documentation to enable reliable, repeatable operations
  • Own the full observability stack for automation services, provisioning pipelines, and cluster health systems

Requirements

What you’ll need
  • 5+ years in infrastructure engineering or platform engineering
  • 3+ years writing production Go
  • Deep understanding of Kubernetes internals, including:
  • Informers and work queues
  • Controller-runtime and client-go
  • CRDs, custom controllers, and operators
  • Admission webhooks
  • Experience building Kubernetes Operators
  • Experience building gRPC and REST APIs in Go at production scale
  • Familiarity with bare metal infrastructure concepts, including PXE, IPMI, and BMC
  • Strong testing discipline across unit, integration, and end-to-end tests
  • Proven ownership of observability stacks such as Prometheus, Grafana, OpenTelemetry, and Loki (or similar)

Benefits

Comp & perks
  • Stock Options
  • 100% paid Medical, Dental, and Vision insurance for Employees
  • Company Health Savings Account Contributions
  • 100% paid Short Term and Long Term Disability Insurance for Employees
  • Life and Voluntary Supplemental Insurance Options
  • Other Insurance Options, such as Pet & Legal Insurance
  • Various Supplementary Health Benefits, such as discounted Virtual Healthcare Appointments and Serious Illness Support
  • Flexible Spending Account
  • 401(k)
  • Employee Assistance Program
  • Flexible PTO
  • Paid Holidays
  • Parental Leave
  • Other In-Office Perks

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GoKubernetesgRPCREST APIsbare metal infrastructurePXEIPMIBMCtesting disciplineobservability stacks
Soft Skills
documentationautomationownership