NVIDIA

Senior ML Evaluation Engineer – Autonomous Vehicles

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaDistrict of ColumbiaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $184,000 - $356,500 per year

Job Level

About the role

  • Design and build learned evaluation pipelines that assess driving behavior using LLMs, VLMs, and multimodal models
  • Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios
  • Define evaluation-of-evaluation methodology — how do we know our learned evaluators are correct?
  • Build golden-set frameworks and calibration loops for learned metrics
  • Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)
  • Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning
  • Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives

Requirements

  • PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
  • Hands-on experience building LLM/VLM-based pipelines — fine-tuning, prompt engineering, retrieval-augmented generation, chain-of-thought
  • Track record of shipping ML systems to production (not just prototyping or publishing)
  • Strong software engineering fundamentals — you write clean, tested, reviewable code in Python and C++
  • Experience with evaluation methodology: precision/recall, inter-rater reliability, calibration, annotation pipelines
  • Comfort with large-scale data processing (Spark, Dask, or similar)
  • Strong Python skills.
  • Experience with PyTorch or JAX.
  • Comfortable with GPU-based training workflows.
Benefits
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LLMVLMmultimodal modelsmodel inferenceretrieval-augmented generationchain-of-thoughtPythonC++PyTorchJAX
Soft Skills
clean codetested codereviewable code
Certifications
PhDMSBS