Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Zoox

Part-Time Student Worker – AI Validation and Benchmarking Engineer

Zoox

Part-Time Student Worker contributing to AI validation and benchmarking for autonomous ride-hailing company. Engaging with real projects and collaborating with engineers in transportation challenges.

Posted 7/3/2026part-timeFoster City • California • 🇺🇸 United StatesEntry LevelWebsite

Tech Stack

Tools & technologies
Python

About the role

Key responsibilities & impact
  • Run and maintain the benchmark pipeline, analyzing results to identify routing errors and regressions across agent variants
  • Build and expand ground truth datasets used to evaluate agent outputs against known-correct answers
  • Identify and address gaps in benchmark validation and support building a more comprehensive evaluation infrastructure to improve validation prior to release
  • Develop new evaluation dimensions such as label accuracy and structured output correctness beyond the existing team classification benchmarks
  • Investigate failure modes in agent outputs and work with engineers to surface actionable improvements
  • Write scripts and tooling to automate data collection, result parsing, and metric reporting
  • Document findings, track benchmark trends over time, and present results to the team

Requirements

What you’ll need
  • Currently enrolled in a B.S. or M.S. in Computer Science, Data Science, Engineering or a related field
  • Available to commit to a minimum three-month assignment
  • Able to commit to a minimum of 20 hours per week
  • Able to work on-site at one of our office locations
  • Must adhere with Zoox confidentiality requirements, including refraining from using or sharing proprietary company information outside of Zoox, such as in academic research, theses, publications, or presentations
  • Familiar with Cursor or Claude
  • Familiar with Python
  • Familiar with evaluation concepts: precision, recall, F1 score, and confusion matrices
  • Comfortable working with structured data (CSV, JSON)
  • Experience modifying or writing reproducible analysis scripts
  • Prior exposure to LLM-based systems, prompt engineering, or AI agent evaluation
  • Experience with Jira or Slack (e.g. ticketing systems, messaging apps)

Benefits

Comp & perks
  • If you need an accommodation to participate in the application or interview process please reach out to accommodations@zoox.com or your assigned recruiter.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Data AnalysisEvaluation ConceptsScript WritingMetric ReportingStructured Data Handling