Senior ML Evaluation Engineer – Autonomous Vehicles

NVIDIA

Senior ML Evaluation Engineer at NVIDIA designing evaluation pipelines for autonomous vehicles. Building systems that bridge ML research and production evaluation within a strong technological environment.

Posted 4/17/2026full-timeRemote • California, District of Columbia, North Carolina, Washington • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite

Tech Stack

Tools & technologies

PythonPyTorchSpark

About the role

Key responsibilities & impact

Design and build learned evaluation pipelines that assess driving behavior using LLMs, VLMs, and multimodal models
Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios
Define evaluation-of-evaluation methodology — how do we know our learned evaluators are correct?
Build golden-set frameworks and calibration loops for learned metrics
Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)
Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning
Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives

Requirements

What you’ll need

PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
Hands-on experience building LLM/VLM-based pipelines — fine-tuning, prompt engineering, retrieval-augmented generation, chain-of-thought
Track record of shipping ML systems to production (not just prototyping or publishing)
Strong software engineering fundamentals — you write clean, tested, reviewable code in Python and C++
Experience with evaluation methodology: precision/recall, inter-rater reliability, calibration, annotation pipelines
Comfort with large-scale data processing (Spark, Dask, or similar)
Strong Python skills.
Experience with PyTorch or JAX.
Comfortable with GPU-based training workflows.

Benefits

Comp & perks

equity
benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

LLMVLMmultimodal modelsmodel inferenceretrieval-augmented generationchain-of-thoughtPythonC++PyTorchJAX

Soft Skills

clean codetested codereviewable code

Certifications

PhDMSBS