Specialty Solutions Engineer – AI-Modern Data Center

Thinkahead Consultant Psychologist Pty Ltd

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $250,000 - $300,000 per year

Job Level

About the role

  • Lead technical discovery and solution positioning for enterprise AI infrastructure opportunities, translating business outcomes into reference architectures and value propositions.
  • Own pre-sales deliverables including architectures, diagrams, sizing, BOMs, proposals.
  • Deliver executive and technical presentations focused on NVIDIA AI Enterprise (NVAIE), LLM training/inference, and accelerated analytics.
  • Guide clients through technology selection, roadmap development, and business case creation for large-scale AI initiatives.
  • Architect end-to-end AI platforms using NVIDIA DGX/HGX, Blackwell (B100/B200), Hopper (H100/H200), Grace/Grace-Hopper (GH200), L40S, NVLink/NVSwitch, InfiniBand (NVIDIA Quantum), RoCE, and DPU offload patterns.
  • Design solutions leveraging AMD Instinct (MI300/MI300X) as appropriate, articulating trade-offs in CPU/GPU/DPU, interconnect topology, and cluster scale-out.
  • Integrate NVIDIA AI Enterprise components (CUDA, cuDNN, TensorRT, Triton Inference Server, RAPIDS) and common ML frameworks (PyTorch, TensorFlow) with orchestration platforms.
  • Experience integrating on-prem GPU clusters with cloud AI services (AWS SageMaker, Azure ML, GCP Vertex AI) for hybrid bursting and workload mobility.
  • Advise on MLOps platforms (MLflow, Kubeflow, Weights & Biases), CI/CD, and governance for multi-tenant AI environments.
  • Build and maintain relationships with NVIDIA, AMD, Run:AI, OEMs, and networking vendors, aligning campaigns with partner programs and incentives.
  • Contribute feedback to vendor engineering and product teams, coordinating joint enablement and reference designs.
  • Create repeatable assets such as validated designs, sizing calculators, POV guides, deployment runbooks, and competitive playbooks.
  • Mentor SEs and delivery consultants, leading internal training on AI scheduling, performance tuning, and operational best practices.
  • Lead proof-of-value (POV) and proof-of-concept (POC) engagements, including success criteria, benchmarking, and recommendations.

Requirements

  • Proven experience architecting and deploying NVIDIA GPU-based AI platforms (NVAIE, DGX/HGX, Blackwell, Hopper, Grace, L40S, H100/H200, B100/B200, GH200) and/or AMD Instinct MI300/MI300X.
  • Experience with Run:AI, NVIDIA Base Command, Kubernetes (GPU Operator), Slurm, and/or vSphere with Tanzu for AI/ML workloads.
  • Advanced knowledge of AI/ML frameworks and libraries (PyTorch, TensorFlow, RAPIDS, Triton, CUDA, cuDNN, TensorRT).
  • Strong understanding of high-speed networking for AI (InfiniBand, RoCE, DPU integration, NVLink, NVSwitch).
  • Experience integrating on-prem AI infrastructure with public cloud AI services (AWS SageMaker, Azure ML, GCP Vertex AI) and hybrid architectures.
  • Experience leading pre-sales campaigns, POV/POC management, and executive presentations.
  • Ability to identify and leverage emerging datacenter and AI technologies to drive innovative solutions.
  • Strong analytical skills for troubleshooting complex environments, including storage, compute, networking, and AI workloads.
  • Skilled at guiding clients through decision-making with clear, strategic recommendations.
  • Proven track record of working effectively across sales, engineering, and vendor teams.
  • Knowledge of datacenter security best practices and regulatory compliance.
Benefits
  • Medical, Dental, and Vision Insurance
  • 401(k)
  • Paid company holidays
  • Paid time off
  • Paid parental and caregiver leave
  • Plus more! See benefits https://www.aheadbenefits.com/ for additional details.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
NVIDIA AI EnterpriseNVIDIA DGXNVIDIA HGXAMD InstinctPyTorchTensorFlowCUDAcuDNNTensorRTMLOps
Soft Skills
analytical skillsstrategic recommendationsmentoringrelationship buildingcommunicationleadershiptroubleshootingguiding clientscollaborationpresentation skills