Vultr

Strategic Technical Account Manager, GPU

Vultr

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $115,000 - $140,000 per year

About the role

  • Lead onboarding for customers deploying GPU clusters (bare metal, VMs, or hybrid).
  • Advise on cluster design: multi-GPU topology, NVLink/NVSwitch considerations, RDMA, Infiniband and RoCE Ethernet, networking throughput, and storage IOPS requirements.
  • Guide customers in selecting GPU types and configurations based on workload (training, fine-tuning, inference, embeddings, RAG pipelines).
  • Support distributed frameworks: PyTorch, TensorFlow, DeepSpeed, Megatron, JAX, Ray, Mosaic, HuggingFace, etc.
  • Advanced hands on Kubernetes skills
  • Advanced hands on SLURM skills
  • Identify bottlenecks (network, storage, memory bandwidth).
  • Provide tuning recommendations for batch size, mixed precision, parallelization strategies, and checkpointing.
  • Help customers evaluate cost vs. performance tradeoffs (GPU mix, CPU pairing, instance types, cluster sizing).
  • Own the long-term technical strategy across assigned GPU/AI accounts, including hyperscalers, labs, and high-growth AI startups.
  • Host recurring technical review meetings, roadmap reviews, and optimization sessions.
  • Define scaling plans, future GPU reservation needs, and capacity forecasting.
  • Partner with Support, SRE, Networking, NOC, and Product Management & Engineering to resolve high-urgency incidents.
  • Manage outage communications, corrective action plans, and postmortem reviews with customers.
  • Advocate for GPU reliability improvements and influence roadmap priorities.
  • Identify opportunities for expanded clusters, high speed storage, or networking upgrades.
  • Support Sales with technical validation and architecture diagrams needed for expansion.
  • Provide structured feedback on existing and future GPU offerings, networking fabrics, storage platforms, and upcoming AI/ML platform features.
  • Partner with Product on early access programs (new GPUs, pipelines, orchestration, etc.).

Requirements

  • 2–5+ years as an AI/ML Engineer, AI/ML Ops, Technical Account Manager, HPC Engineer, Sales/Solutions Engineer or relevant technical role.
  • Strong knowledge of GPU hardware architectures (NVIDIA/AMD), CUDA/ROCm, distributed training, and ML frameworks.
  • Experience with Linux tuning, networking (Infiniband, RoCE fabrics).
  • Experience with high-performance storage systems (DDN, NetApp, Vast, Weka, etc.).
  • Ability to communicate complex concepts clearly to both executives and engineering teams.
  • Prior experience supporting hyperscale, AI labs, or large cluster deployments is a plus.
  • Cloud Native Computing Foundation Certified Kubernetes Administrator (CKA) certification is a plus.
Benefits
  • 100% company-paid insurance premiums for employee medical, dental and vision plans.
  • 401(k) plan that matches 100% up to 4%, with immediate vesting
  • Professional Development Reimbursement of $2,500 each year
  • 11 Holidays + Paid Time Off Accrual + Rollover Plan
  • Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
  • $500 stipend for remote office setup in first year + $400 each following year
  • Internet reimbursement up to $75 per month
  • Gym membership reimbursement up to $50 per month
  • Company paid Wellable subscription
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GPU clustersmulti-GPU topologyNVLinkRDMAInfinibandRoCE EthernetKubernetesSLURMCUDALinux tuning
Soft Skills
communicationtechnical strategyproblem-solvingcollaborationcustomer supportfeedback provisionadvocacycapacity forecastingincident managementroadmap review
Certifications
Cloud Native Computing Foundation Certified Kubernetes Administrator (CKA)