FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior ML Evaluation Engineer – Autonomous Vehicles
NVIDIASenior ML Evaluation Engineer at NVIDIA designing evaluation pipelines for autonomous vehicles. Building systems that bridge ML research and production evaluation within a strong technological environment.
Posted 4/17/2026full-timeRemote • California, District of Columbia, North Carolina, Washington • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite
Tech Stack
Tools & technologiesPythonPyTorchSpark
About the role
Key responsibilities & impact- Design and build learned evaluation pipelines that assess driving behavior using LLMs, VLMs, and multimodal models
- Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios
- Define evaluation-of-evaluation methodology — how do we know our learned evaluators are correct?
- Build golden-set frameworks and calibration loops for learned metrics
- Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)
- Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning
- Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives
Requirements
What you’ll need- PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
- Hands-on experience building LLM/VLM-based pipelines — fine-tuning, prompt engineering, retrieval-augmented generation, chain-of-thought
- Track record of shipping ML systems to production (not just prototyping or publishing)
- Strong software engineering fundamentals — you write clean, tested, reviewable code in Python and C++
- Experience with evaluation methodology: precision/recall, inter-rater reliability, calibration, annotation pipelines
- Comfort with large-scale data processing (Spark, Dask, or similar)
- Strong Python skills.
- Experience with PyTorch or JAX.
- Comfortable with GPU-based training workflows.
Benefits
Comp & perks- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLMVLMmultimodal modelsmodel inferenceretrieval-augmented generationchain-of-thoughtPythonC++PyTorchJAX
Soft Skills
clean codetested codereviewable code
Certifications
PhDMSBS