Handshake

AI Research Scientist, Evaluation

Handshake

full-time

Posted on:

Origin:  • 🇺🇸 United States • California, New York

Visit company website
AI Apply
Manual Apply

Salary

💰 $200,000 - $375,000 per year

Job Level

Mid-LevelSenior

Tech Stack

PythonPyTorch

About the role

  • Handshake is building the career network for the AI economy, connecting 18 million students and alumni, 1,500+ academic institutions across the U.S. and Europe, and 1 million employers
  • Handshake AI is a human data labeling business that leverages the scale of the largest early career network to build domain-specific data and evaluation at scale
  • Unique opportunity to join a fast-growing team shaping the future of AI through better data, better tools, and better systems—for experts, by experts
  • Design and conduct original research in LLM understanding, evaluation methodologies, and the dynamics of human-AI knowledge interaction
  • Develop novel evaluation frameworks and assessment techniques that reveal deep insights into model capabilities and limitations
  • Collaborate with engineers to transform research breakthroughs into scalable benchmarks and evaluation systems
  • Pioneer new approaches to measuring model understanding, reasoning capabilities, and alignment with human knowledge
  • Write high-quality code to support large-scale experimentation, evaluation, and knowledge assessment workflows
  • Publish findings in top-tier conferences and contribute to advancing the field’s understanding of AI capabilities
  • Work with cross-functional teams to establish new standards for responsible AI evaluation and knowledge alignment

Requirements

  • PhD or equivalent research experience in machine learning, computer science, cognitive science, or a related field with focus on AI evaluation or understanding
  • Strong background in LLM research, model evaluation methodologies, interpretability, or foundational AI assessment techniques
  • Demonstrated ability to independently lead post training and evaluation research projects from theoretical framework to empirical validation
  • Proficiency in Python and deep experience with PyTorch for large-scale model analysis and evaluation
  • Experience designing and conducting experiments with large language models, benchmark development, or systematic model assessment
  • Strong publication record in post training, AI evaluation, model understanding, interpretability, or related areas that advance our comprehension of AI capabilities
  • Ability to clearly communicate complex insights about model behavior, evaluation methodologies, and their implications for AI development
  • Extra Credit: Experience with RL, agent modeling, or AI alignment
  • Extra Credit: Familiarity with data-centric AI approaches, synthetic data generation, or human-in-the-loop systems
  • Extra Credit: Understanding of the challenges in scaling foundation models (e.g., training stability, safety, inference efficiency)
  • Extra Credit: Contributions to open-source AI libraries or research tooling
  • Extra Credit: Interest in shaping the societal impact, deployment ethics, and governance of frontier models