FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Part-Time Student Worker – AI Validation and Benchmarking Engineer
ZooxPart-Time Student Worker contributing to AI validation and benchmarking for autonomous ride-hailing company. Engaging with real projects and collaborating with engineers in transportation challenges.
Tech Stack
Tools & technologiesPython
About the role
Key responsibilities & impact- Run and maintain the benchmark pipeline, analyzing results to identify routing errors and regressions across agent variants
- Build and expand ground truth datasets used to evaluate agent outputs against known-correct answers
- Identify and address gaps in benchmark validation and support building a more comprehensive evaluation infrastructure to improve validation prior to release
- Develop new evaluation dimensions such as label accuracy and structured output correctness beyond the existing team classification benchmarks
- Investigate failure modes in agent outputs and work with engineers to surface actionable improvements
- Write scripts and tooling to automate data collection, result parsing, and metric reporting
- Document findings, track benchmark trends over time, and present results to the team
Requirements
What you’ll need- Currently enrolled in a B.S. or M.S. in Computer Science, Data Science, Engineering or a related field
- Available to commit to a minimum three-month assignment
- Able to commit to a minimum of 20 hours per week
- Able to work on-site at one of our office locations
- Must adhere with Zoox confidentiality requirements, including refraining from using or sharing proprietary company information outside of Zoox, such as in academic research, theses, publications, or presentations
- Familiar with Cursor or Claude
- Familiar with Python
- Familiar with evaluation concepts: precision, recall, F1 score, and confusion matrices
- Comfortable working with structured data (CSV, JSON)
- Experience modifying or writing reproducible analysis scripts
- Prior exposure to LLM-based systems, prompt engineering, or AI agent evaluation
- Experience with Jira or Slack (e.g. ticketing systems, messaging apps)
Benefits
Comp & perks- If you need an accommodation to participate in the application or interview process please reach out to accommodations@zoox.com or your assigned recruiter.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Data AnalysisEvaluation ConceptsScript WritingMetric ReportingStructured Data Handling