Handshake

Staff AI Research Scientist – Evaluation

Handshake

full-time

Posted on:

Location Type: Office

Location: San Francisco • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $350,000 - $420,000 per year

Job Level

Lead

Tech Stack

PythonPyTorch

About the role

  • Lead teams of researchers to produce original research in LLM evaluation methodologies, interpretability, and human-AI knowledge alignment.
  • Develop novel frameworks and assessment techniques that reveal deep insights into model capabilities, limitations, and emergent behaviors.
  • Collaborate with engineers to translate research breakthroughs into scalable benchmarks, evaluation systems, and standards.
  • Pioneer new approaches to measuring reasoning, alignment, and trustworthiness in frontier AI systems.
  • Author high-quality code to enable large-scale experimentation, reproducible evaluation, and knowledge assessment workflows.
  • Publish in top-tier conferences and journals, establishing new directions in the science of AI evaluation.
  • Work cross-functionally with leadership, engineers, and external partners to set industry standards for responsible AI evaluation and alignment.

Requirements

  • PhD or equivalent research experience in machine learning, computer science, cognitive science, or related fields with focus on AI evaluation, interpretability, or model understanding.
  • 6+ years of academic or industry experience post-doc in a research-first environment
  • Strong background in LLM research, evaluation methodologies, and/or foundational AI assessment techniques.
  • Proven ability to independently design, lead, and execute evaluation research programs with novel data types end-to-end.
  • Deep proficiency in Python and PyTorch for large-scale model analysis, benchmarking, and evaluation.
  • Experience building or leading novel benchmark development, systematic model assessment, or interpretability studies.
  • Strong publication record in post-training, evaluation, or interpretability that demonstrates field-defining contributions.
  • Ability to clearly communicate complex insights and influence both technical and non-technical stakeholders.
Benefits
  • Ownership: Equity in a fast-growing company
  • Financial Wellness: 401(k) match, competitive compensation, financial coaching
  • Family Support: Paid parental leave, fertility benefits, parental coaching
  • Wellbeing: Medical, dental, and vision, mental health support, $500 wellness stipend
  • Growth: $2,000 learning stipend, ongoing development
  • Remote & Office: Stipends for home office setup, internet, commuting, and free lunch/gym in our SF office
  • Time Off: Flexible PTO, 15 holidays + 2 flex days, winter #ShakeBreak where our whole office closes for a week!
  • Connection: Team outings & referral bonuses

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
machine learningLLM evaluation methodologiesinterpretabilitymodel understandingPythonPyTorchbenchmark developmentsystematic model assessmentevaluation researchdata analysis
Soft skills
leadershipcommunicationcollaborationinfluenceindependent researchproblem-solvingcritical thinkingcross-functional teamworkcreativityadaptability
Certifications
PhD
Handshake

Staff AI Research Scientist, Data Quality

Handshake
Leadfull-time$350k–$420k / yearCalifornia · 🇺🇸 United States
Posted: 3 hours agoSource: jobs.ashbyhq.com
PythonPyTorch