
Senior ML Evaluation Engineer – Autonomous Vehicles
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • District of Columbia • United States
Visit company websiteExplore more
Salary
💰 $184,000 - $356,500 per year
Job Level
About the role
- Design and build learned evaluation pipelines that assess driving behavior using LLMs, VLMs, and multimodal models
- Develop agentic workflows that chain model inference, retrieval, and structured reasoning to evaluate complex driving scenarios
- Define evaluation-of-evaluation methodology — how do we know our learned evaluators are correct?
- Build golden-set frameworks and calibration loops for learned metrics
- Partner with AML (Alpamayo Logos) teams on model-specific eval needs (e.g., COT prediction quality, AML regression coverage)
- Instrument evaluation systems with robust experiment tracking, A/B comparison tooling, and model versioning
- Contribute to the team's transition from rule-based to learned evaluation: identify metrics and analyzers that are candidates for ML replacement and build the alternatives
Requirements
- PhD with 4+ years, MS with 6+ years, or BS (or equivalent experience) with 8+ years of relevant experience in Computer Science, Computer Engineering, or a related technical field.
- Hands-on experience building LLM/VLM-based pipelines — fine-tuning, prompt engineering, retrieval-augmented generation, chain-of-thought
- Track record of shipping ML systems to production (not just prototyping or publishing)
- Strong software engineering fundamentals — you write clean, tested, reviewable code in Python and C++
- Experience with evaluation methodology: precision/recall, inter-rater reliability, calibration, annotation pipelines
- Comfort with large-scale data processing (Spark, Dask, or similar)
- Strong Python skills.
- Experience with PyTorch or JAX.
- Comfortable with GPU-based training workflows.
Benefits
- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLMVLMmultimodal modelsmodel inferenceretrieval-augmented generationchain-of-thoughtPythonC++PyTorchJAX
Soft Skills
clean codetested codereviewable code
Certifications
PhDMSBS