AI Inference Engineer – Model Optimization, Deployment

Zoox

Model Optimization & Deployment Engineer optimizing large-scale ML models for Zoox's autonomous vehicle technology. Focused on deployment for efficient real-time execution in vehicles.

Posted 4/11/2026full-timeFoster City • California, Washington • 🇺🇸 United StatesMid-LevelSenior💰 $242,000 - $290,000 per yearWebsite

Tech Stack

Tools & technologies

PythonPyTorch

About the role

Key responsibilities & impact

Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference workflows, and parameter-efficient fine-tuning (LoRA, QLoRA).
Architect and implement model conversion and compilation pipelines using TensorRT and TensorRT-LLM for edge deployment.
Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
Write and optimize custom CUDA kernels and TensorRT Plugins to maximize memory bandwidth and minimize latency on AI accelerators.
Write production-level, highly concurrent, and memory-safe C++ and Python code for real-time inference on vehicle SOCs.

Requirements

What you’ll need

Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference workflows (INT8, FP8, INT4, BF16/FP16).
Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) utilizing KV-cache optimization (e.g., PagedAttention), Speculative Decoding, and Efficient Attention mechanisms (FlashAttention, Linear Attention).
Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and performing rigorous parity/latency benchmarking.
Proficiency in low-level programming for AI accelerators, specifically writing and optimizing custom CUDA kernels and TensorRT Plugins.
Production-level C++ (14/17/20) and Python programming skills, with experience writing concurrent, memory-safe, real-time inference code for edge devices.

Benefits

Comp & perks

Paid time off (e.g. sick leave, vacation, bereavement)
Unpaid time off
Zoox Stock Appreciation Rights
Amazon RSUs
Health insurance
Long-term care insurance
Long-term and short-term disability insurance
Life insurance

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

model quantizationmixed-precision inferenceCUDA programmingC++ programmingPython programmingreal-time inferenceconcurrent programmingmemory-safe programmingmodel optimizationlatency benchmarking