AI Research Scientist, Evaluation

Handshake

full-time

Posted on: 8/30/2025

Location: California, New York • 🇺🇸 United States

✨ AI Apply

💰 $200,000 - $375,000 per year

Mid-LevelSenior

PythonPyTorch

About the role

Handshake is building the career network for the AI economy, connecting 18 million students and alumni, 1,500+ academic institutions across the U.S. and Europe, and 1 million employers
Handshake AI is a human data labeling business that leverages the scale of the largest early career network to build domain-specific data and evaluation at scale
Unique opportunity to join a fast-growing team shaping the future of AI through better data, better tools, and better systems—for experts, by experts
Design and conduct original research in LLM understanding, evaluation methodologies, and the dynamics of human-AI knowledge interaction
Develop novel evaluation frameworks and assessment techniques that reveal deep insights into model capabilities and limitations
Collaborate with engineers to transform research breakthroughs into scalable benchmarks and evaluation systems
Pioneer new approaches to measuring model understanding, reasoning capabilities, and alignment with human knowledge
Write high-quality code to support large-scale experimentation, evaluation, and knowledge assessment workflows
Publish findings in top-tier conferences and contribute to advancing the field’s understanding of AI capabilities
Work with cross-functional teams to establish new standards for responsible AI evaluation and knowledge alignment

PhD or equivalent research experience in machine learning, computer science, cognitive science, or a related field with focus on AI evaluation or understanding
Strong background in LLM research, model evaluation methodologies, interpretability, or foundational AI assessment techniques
Demonstrated ability to independently lead post training and evaluation research projects from theoretical framework to empirical validation
Proficiency in Python and deep experience with PyTorch for large-scale model analysis and evaluation
Experience designing and conducting experiments with large language models, benchmark development, or systematic model assessment
Strong publication record in post training, AI evaluation, model understanding, interpretability, or related areas that advance our comprehension of AI capabilities
Ability to clearly communicate complex insights about model behavior, evaluation methodologies, and their implications for AI development
Extra Credit: Experience with RL, agent modeling, or AI alignment
Extra Credit: Familiarity with data-centric AI approaches, synthetic data generation, or human-in-the-loop systems
Extra Credit: Understanding of the challenges in scaling foundation models (e.g., training stability, safety, inference efficiency)
Extra Credit: Contributions to open-source AI libraries or research tooling
Extra Credit: Interest in shaping the societal impact, deployment ethics, and governance of frontier models