Wizard

Data Scientist – AI Evaluation

Wizard

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $225,000 - $280,000 per year

About the role

  • Define and evolve accuracy metrics across the full shopping experience (retrieval, ranking, recommendations and outcomes)
  • Design and run experiments to measure improvements and regressions
  • Build and maintain evaluation datasets, benchmarks and scoring frameworks
  • Translate ambiguous product questions into clear, measurable hypotheses and analysis
  • Partner with ML Engineers to validate model changes and guide iteration
  • Identify failure modes and edge cases and drive improvements through data
  • Create dashboards and reporting that make agent performance visible, trusted and actionable

Requirements

  • 4-6+ years in Data Science, ML Evaluation or Applied AI or similar roles
  • Deep experience evaluating AI/ML systems (ranking, recommendations, LLMs, etc)
  • Strong experience with experimentation (A/B testing, causal inference)
  • Experience working on consumer products or user facing systems and exposure to marketplace or e-commerce systems
  • Ability to translate messy problems into structured analysis and metrics
  • Strong product mindset, you care about real user outcomes
  • Clear communication with the ability to influence across engineering and product
Benefits
  • Equity in the form of stock options
  • Medical, dental, and vision coverage
  • 401(k) plan
  • Flexible PTO and company holidays
  • Fully remote work within the United States
  • Periodic company offsites and team gatherings
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data sciencemachine learning evaluationA/B testingcausal inferenceevaluation datasetsbenchmarksscoring frameworkshypothesis analysismodel validationdata-driven improvements
Soft Skills
clear communicationinfluencing skillsproduct mindsetstructured analysisproblem-solvingcollaborationadaptabilitycritical thinkingattention to detailuser outcome focus