Zoox

Machine Learning Engineer – Multi-Modality Foundation Model

Zoox

full-time

Posted on:

Location Type: Hybrid

Location: Foster CityCaliforniaMassachusettsUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $189,000 - $258,000 per year

Tech Stack

About the role

  • Build, pre-train, and evaluate large-scale multi-modality foundation models from the ground up, successfully aligning diverse data streams (e.g., Vision, LiDAR, Radar, Language, Audio).
  • Define and execute the ML roadmap for deploying these multi-modality representations to the vehicle.
  • Architect and implement Knowledge Distillation pipelines to compress large-capacity multi-modal teacher models into highly efficient, production-ready student models.
  • Build high-quality training and evaluation datasets, applying advanced data-centric techniques to maximize cross-modal representation learning and student model convergence.
  • Collaborate with downstream perception teams to integrate and validate the performance, robustness, and latency of your models in on-board production systems.

Requirements

  • MS or PhD in Computer Science, Machine Learning, or a related technical field with demonstrated professional experience.
  • Deep, proven expertise in building and training large-scale multi-modality foundation models (e.g., Vision-Language Models (VLMs), Vision-Audio-Text, or Vision-LiDAR-Radar architectures).
  • Strong understanding of cross-modal alignment, multi-modal attention mechanisms, and large-scale pre-training techniques.
  • Proven experience in Knowledge Distillation (KD), model compression, and training highly efficient student models for production environments.
  • Proficiency in ML frameworks (e.g., PyTorch) and experience building large-scale ML training and evaluation pipelines.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learningmulti-modality foundation modelsKnowledge Distillationmodel compressioncross-modal alignmentmulti-modal attention mechanismslarge-scale pre-training techniquestraining datasetsevaluation datasetsML frameworks
Certifications
MS in Computer SciencePhD in Machine Learning