Data Scientist – AI Evaluation

Wizard

full-time

Posted on: 3/24/2026

Location Type: Remote

Location: United States

Visit company website

Explore more

Data Scientist jobs

✨ AI Apply

Apply

Salary

💰 $225,000 - $280,000 per year

Job Level

Mid-Level Senior

About the role

Define and evolve accuracy metrics across the full shopping experience (retrieval, ranking, recommendations and outcomes)
Design and run experiments to measure improvements and regressions
Build and maintain evaluation datasets, benchmarks and scoring frameworks
Translate ambiguous product questions into clear, measurable hypotheses and analysis
Partner with ML Engineers to validate model changes and guide iteration
Identify failure modes and edge cases and drive improvements through data
Create dashboards and reporting that make agent performance visible, trusted and actionable

Requirements

4-6+ years in Data Science, ML Evaluation or Applied AI or similar roles
Deep experience evaluating AI/ML systems (ranking, recommendations, LLMs, etc)
Strong experience with experimentation (A/B testing, causal inference)
Experience working on consumer products or user facing systems and exposure to marketplace or e-commerce systems
Ability to translate messy problems into structured analysis and metrics
Strong product mindset, you care about real user outcomes
Clear communication with the ability to influence across engineering and product

Benefits

Equity in the form of stock options
Medical, dental, and vision coverage
401(k) plan
Flexible PTO and company holidays
Fully remote work within the United States
Periodic company offsites and team gatherings

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

data sciencemachine learning evaluationA/B testingcausal inferenceevaluation datasetsbenchmarksscoring frameworkshypothesis analysismodel validationdata-driven improvements

Soft Skills

clear communicationinfluencing skillsproduct mindsetstructured analysisproblem-solvingcollaborationadaptabilitycritical thinkingattention to detailuser outcome focus