Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior ML Evaluation Engineer – Autonomous Vehicles

NVIDIA

Senior ML Evaluation Engineer at NVIDIA designing evaluation pipelines for autonomous vehicles. Building systems that bridge ML research and production evaluation within a strong technological environment.

Posted 4/17/2026full-timeRemote • California, District of Columbia, North Carolina, Washington • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite

Tech Stack

Tools & technologies
PythonPyTorchSpark

About the role

Key responsibilities & impact
  • Design and build learned evaluation pipelines that assess driving behavior using LLMs, VLMs, and multimodal models
  • Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios
  • Define evaluation-of-evaluation methodology — how do we know our learned evaluators are correct?
  • Build golden-set frameworks and calibration loops for learned metrics
  • Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)
  • Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning
  • Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives

Requirements

What you’ll need
  • PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
  • Hands-on experience building LLM/VLM-based pipelines — fine-tuning, prompt engineering, retrieval-augmented generation, chain-of-thought
  • Track record of shipping ML systems to production (not just prototyping or publishing)
  • Strong software engineering fundamentals — you write clean, tested, reviewable code in Python and C++
  • Experience with evaluation methodology: precision/recall, inter-rater reliability, calibration, annotation pipelines
  • Comfort with large-scale data processing (Spark, Dask, or similar)
  • Strong Python skills.
  • Experience with PyTorch or JAX.
  • Comfortable with GPU-based training workflows.

Benefits

Comp & perks
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LLMVLMmultimodal modelsmodel inferenceretrieval-augmented generationchain-of-thoughtPythonC++PyTorchJAX
Soft Skills
clean codetested codereviewable code
Certifications
PhDMSBS