
Data Scientist – AI Evaluation
Wizard
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $225,000 - $280,000 per year
About the role
- Define and evolve accuracy metrics across the full shopping experience (retrieval, ranking, recommendations and outcomes)
- Design and run experiments to measure improvements and regressions
- Build and maintain evaluation datasets, benchmarks and scoring frameworks
- Translate ambiguous product questions into clear, measurable hypotheses and analysis
- Partner with ML Engineers to validate model changes and guide iteration
- Identify failure modes and edge cases and drive improvements through data
- Create dashboards and reporting that make agent performance visible, trusted and actionable
Requirements
- 4-6+ years in Data Science, ML Evaluation or Applied AI or similar roles
- Deep experience evaluating AI/ML systems (ranking, recommendations, LLMs, etc)
- Strong experience with experimentation (A/B testing, causal inference)
- Experience working on consumer products or user facing systems and exposure to marketplace or e-commerce systems
- Ability to translate messy problems into structured analysis and metrics
- Strong product mindset, you care about real user outcomes
- Clear communication with the ability to influence across engineering and product
Benefits
- Equity in the form of stock options
- Medical, dental, and vision coverage
- 401(k) plan
- Flexible PTO and company holidays
- Fully remote work within the United States
- Periodic company offsites and team gatherings
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data sciencemachine learning evaluationA/B testingcausal inferenceevaluation datasetsbenchmarksscoring frameworkshypothesis analysismodel validationdata-driven improvements
Soft Skills
clear communicationinfluencing skillsproduct mindsetstructured analysisproblem-solvingcollaborationadaptabilitycritical thinkingattention to detailuser outcome focus