
Senior Research Scientist, Reward Models
Anthropic
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • 🇺🇸 United States
Visit company websiteSalary
💰 $340,000 - $425,000 per year
Job Level
Senior
About the role
- Lead research on novel reward model architectures and training approaches for RLHF
- Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability
- Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
- Design experiments to understand reward model generalization, robustness, and failure modes
- Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines
- Contribute to research publications, blog posts, and internal documentation
- Mentor other researchers and help build institutional knowledge around reward modeling
Requirements
- A track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning
- Experience training and evaluating reward models for large language models
- Comfortable designing and running large-scale experiments with significant computational resources
- Work effectively across research and engineering, iterating quickly while maintaining scientific rigor
- Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences
- Care deeply about building AI systems that are both highly capable and safe.
- Strong candidates may also have published research on reward modeling, preference learning, or RLHF
- Experience with LLM-as-judge approaches including calibration and reliability challenges
- Worked on reward hacking, specification gaming, or related robustness problems
- Experience with constitutional AI, debate, or other scalable oversight approaches
- Contributed to production ML systems at scale
- Familiarity with interpretability techniques as applied to understanding reward model behavior.
Benefits
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
reward modelingRLHFlarge language modelstraining modelsevaluating modelsdesigning experimentsinterpretability techniquescalibrationreliability challengesconstitutional AI
Soft skills
collaborative researchcommunicationmentoringscientific rigoriterative developmentbuilding institutional knowledgeproblem-solvingclear idea presentationcross-functional teamworkpassion for AI safety