Head of Inference

Montauk Capital

Head of Inference at Stealth Edge AI Co defining inference architecture and building proof of concept systems. Collaborate with leadership on pioneering AI solutions.

Posted 5/6/2026full-timeNew York City • New York • 🇺🇸 United StatesLeadWebsite

Tech Stack

Tools & technologies

CloudDistributed SystemsKubernetesNode.jsRayRust

About the role

Key responsibilities & impact

Create the inference strategy and define the inference architecture for Edge AI
Own the inference serving layer end-to-end: vLLM, TensorRT-LLM, Triton, or equivalent
Build a credible POC fast — proves the platform works to NVIDIA, cloud providers, and customers
Drive cost-per-token optimization
Optimize GPU utilization, KV-cache management, and batching for production workloads
Own observability and reliability SLAs
Build distributed inference pipelines across multi-GPU, multi-node edge deployments
Set performance baselines and SLAs for inference latency and throughput, plus observability and performance SLA’s
Define quantization strategy
Translate complex inference requirements for infrastructure designs
Define the software access layer architecture and oversee integration efforts
Engage credibly with investors, partners, and technical stakeholders, represent the company externally

Requirements

What you’ll need

Production inference serving — vLLM, TensorRT-LLM, Triton Inference Server, or equivalent distributed at scale
Quantization, SGLang, containerization, cost-per-token
Observability tooling: distributed tracing, latency profiling, alerting. Instrument and debug complex distributed systems with a focus on building world-class observability and debuggability tools
C++/CUDA/Rust
GPU utilization and CUDA kernel optimization — has pushed hardware to its limits
Batching, KV-cache, speculative decoding expertise
Scale systems using Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference
Has built a serving system that NVIDIA and cloud providers respect
Model deployment and serving
Systems engineering
Technical leadership experience, either over teams or outcomes
Startup / 0→1 DNA: You ship fast and communicate clearly

Benefits

Comp & perks

Competitive compensation + equity: True ownership over what you build

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

vLLMTensorRT-LLMTriton Inference ServerC++CUDARustquantizationbatchingGPU optimizationdistributed systems

Soft Skills

technical leadershipcommunicationstakeholder engagementproblem-solvingobservability focusdebuggingcost optimizationperformance optimizationcollaborationadaptability