Senior Research Scientist, Reward Models

Anthropic

full-time

Posted on: 12/17/2025

Location Type: Hybrid

Location: San Francisco • California • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $340,000 - $425,000 per year

Job Level

Senior

About the role

Lead research on novel reward model architectures and training approaches for RLHF
Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability
Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
Design experiments to understand reward model generalization, robustness, and failure modes
Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines
Contribute to research publications, blog posts, and internal documentation
Mentor other researchers and help build institutional knowledge around reward modeling

Requirements

A track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning
Experience training and evaluating reward models for large language models
Comfortable designing and running large-scale experiments with significant computational resources
Work effectively across research and engineering, iterating quickly while maintaining scientific rigor
Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences
Care deeply about building AI systems that are both highly capable and safe.
Strong candidates may also have published research on reward modeling, preference learning, or RLHF
Experience with LLM-as-judge approaches including calibration and reliability challenges
Worked on reward hacking, specification gaming, or related robustness problems
Experience with constitutional AI, debate, or other scalable oversight approaches
Contributed to production ML systems at scale
Familiarity with interpretability techniques as applied to understanding reward model behavior.

Benefits

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

reward modelingRLHFlarge language modelstraining modelsevaluating modelsdesigning experimentsinterpretability techniquescalibrationreliability challengesconstitutional AI

Soft skills

collaborative researchcommunicationmentoringscientific rigoriterative developmentbuilding institutional knowledgeproblem-solvingclear idea presentationcross-functional teamworkpassion for AI safety