Staff AI Research Scientist – Evaluation

Handshake

full-time

Posted on: 10/7/2025

Location Type: Office

Location: San Francisco • California • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $350,000 - $420,000 per year

Job Level

Lead

Tech Stack

PythonPyTorch

About the role

Lead teams of researchers to produce original research in LLM evaluation methodologies, interpretability, and human-AI knowledge alignment.
Develop novel frameworks and assessment techniques that reveal deep insights into model capabilities, limitations, and emergent behaviors.
Collaborate with engineers to translate research breakthroughs into scalable benchmarks, evaluation systems, and standards.
Pioneer new approaches to measuring reasoning, alignment, and trustworthiness in frontier AI systems.
Author high-quality code to enable large-scale experimentation, reproducible evaluation, and knowledge assessment workflows.
Publish in top-tier conferences and journals, establishing new directions in the science of AI evaluation.
Work cross-functionally with leadership, engineers, and external partners to set industry standards for responsible AI evaluation and alignment.

Requirements

PhD or equivalent research experience in machine learning, computer science, cognitive science, or related fields with focus on AI evaluation, interpretability, or model understanding.
6+ years of academic or industry experience post-doc in a research-first environment
Strong background in LLM research, evaluation methodologies, and/or foundational AI assessment techniques.
Proven ability to independently design, lead, and execute evaluation research programs with novel data types end-to-end.
Deep proficiency in Python and PyTorch for large-scale model analysis, benchmarking, and evaluation.
Experience building or leading novel benchmark development, systematic model assessment, or interpretability studies.
Strong publication record in post-training, evaluation, or interpretability that demonstrates field-defining contributions.
Ability to clearly communicate complex insights and influence both technical and non-technical stakeholders.

Benefits

Ownership: Equity in a fast-growing company
Financial Wellness: 401(k) match, competitive compensation, financial coaching
Family Support: Paid parental leave, fertility benefits, parental coaching
Wellbeing: Medical, dental, and vision, mental health support, $500 wellness stipend
Growth: $2,000 learning stipend, ongoing development
Remote & Office: Stipends for home office setup, internet, commuting, and free lunch/gym in our SF office
Time Off: Flexible PTO, 15 holidays + 2 flex days, winter #ShakeBreak where our whole office closes for a week!
Connection: Team outings & referral bonuses

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

machine learningLLM evaluation methodologiesinterpretabilitymodel understandingPythonPyTorchbenchmark developmentsystematic model assessmentevaluation researchdata analysis