You.com

Senior AI Scientist

You.com

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $160,000 - $200,000 per year

Job Level

Tech Stack

About the role

  • Define and own what “good” means for search-augmented and agentic AI systems by designing evaluation frameworks that measure real-world quality, reliability, and user-relevant behavior beyond standard benchmarks.
  • Invent and validate novel evaluation methodologies for non-deterministic systems (LLMs, agents, RAG), including behavioral evals, long-tail and adversarial test sets, and task-specific metrics.
  • Develop rigorous statistical frameworks for model comparison, regression detection, and uncertainty estimation, ensuring evaluation results are defensible and decision-ready.
  • Build and maintain scalable evaluation systems—datasets, gold standards, eval harnesses, scoring pipelines, and analysis tooling—that can be reused across products and customers.
  • Lead customer-facing evaluation research, working directly with enterprise customers to translate domain-specific quality requirements into credible, actionable evals that support product decisions and sales outcomes.
  • Drive competitive evaluations and internal quality reviews, surfacing meaningful performance differences, trade-offs, and failure modes to inform product strategy and prioritization.
  • Partner with engineering and product teams to integrate evals into development loops, release gating, and ongoing quality monitoring.
  • Mentor and set standards for evaluation practice, reviewing eval designs, guiding other scientists, and shaping the long-term evals roadmap as systems become more agentic and complex.
  • End-to-End Project Leadership: Lead the development of new AI-driven projects, encompassing ideation, prototyping, research, infrastructure design, scalability, monitoring, and evaluation.
  • Rapid Iteration: Adapt quickly to user feedback and evolving requirements, ensuring continuous improvement in a fast-paced environment.

Requirements

  • Strong grounding in applied ML and statistics, with experience evaluating non-deterministic AI systems (LLMs, agents, RAG, search).
  • Deep experience with AI evaluation, including metric design, gold dataset creation, head-to-head comparisons, slicing, and error analysis.
  • Statistical rigor in model comparison, using methods such as paired tests, bootstrap confidence intervals, and robustness analyses.
  • Proficiency in Python for evaluation and analysis, including building eval harnesses, data pipelines, scoring logic, and reproducible analysis workflows.
  • Ability to translate vague product or customer goals into measurable evaluation criteria, and to challenge metrics or conclusions that don’t reflect real quality.
  • Comfort engaging directly with customers and cross-functional stakeholders, explaining evaluation results, trade-offs, and limitations clearly.
  • Strong written and verbal communication, including documenting methodologies and contributing to external publications or talks.
Benefits
  • Hubs in San Francisco and New York City offering regular in-person gatherings and co-working sessions
  • Flexible PTO with U.S. holidays observed and a week shutdown in December to rest and recharge*
  • A competitive health insurance plan covers 100% of the policyholder and 75% for dependents*
  • 12 weeks of paid parental leave in the US*
  • 401k program, 3% match - vested immediately!*
  • $500 work-from-home stipend to be used up to a year of your start date*
  • $1,200 per year Health & Wellness Allowance to support your personal goals*
  • The chance to collaborate with a team at the forefront of AI research
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
applied MLstatisticsAI evaluationmetric designgold dataset creationmodel comparisonPythondata pipelinesscoring logicerror analysis
Soft Skills
communicationcustomer engagementcross-functional collaborationmentoringproject leadershipadaptabilityproblem-solvingtranslating goalscontinuous improvementdocumentation