Salary
💰 $200,000 - $375,000 per year
About the role
- Handshake is building the career network for the AI economy, connecting 18 million students and alumni, 1,500+ academic institutions across the U.S. and Europe, and 1 million employers
- Handshake AI is a human data labeling business that leverages the scale of the largest early career network to build domain-specific data and evaluation at scale
- Unique opportunity to join a fast-growing team shaping the future of AI through better data, better tools, and better systems—for experts, by experts
- Design and conduct original research in LLM understanding, evaluation methodologies, and the dynamics of human-AI knowledge interaction
- Develop novel evaluation frameworks and assessment techniques that reveal deep insights into model capabilities and limitations
- Collaborate with engineers to transform research breakthroughs into scalable benchmarks and evaluation systems
- Pioneer new approaches to measuring model understanding, reasoning capabilities, and alignment with human knowledge
- Write high-quality code to support large-scale experimentation, evaluation, and knowledge assessment workflows
- Publish findings in top-tier conferences and contribute to advancing the field’s understanding of AI capabilities
- Work with cross-functional teams to establish new standards for responsible AI evaluation and knowledge alignment
Requirements
- PhD or equivalent research experience in machine learning, computer science, cognitive science, or a related field with focus on AI evaluation or understanding
- Strong background in LLM research, model evaluation methodologies, interpretability, or foundational AI assessment techniques
- Demonstrated ability to independently lead post training and evaluation research projects from theoretical framework to empirical validation
- Proficiency in Python and deep experience with PyTorch for large-scale model analysis and evaluation
- Experience designing and conducting experiments with large language models, benchmark development, or systematic model assessment
- Strong publication record in post training, AI evaluation, model understanding, interpretability, or related areas that advance our comprehension of AI capabilities
- Ability to clearly communicate complex insights about model behavior, evaluation methodologies, and their implications for AI development
- Extra Credit: Experience with RL, agent modeling, or AI alignment
- Extra Credit: Familiarity with data-centric AI approaches, synthetic data generation, or human-in-the-loop systems
- Extra Credit: Understanding of the challenges in scaling foundation models (e.g., training stability, safety, inference efficiency)
- Extra Credit: Contributions to open-source AI libraries or research tooling
- Extra Credit: Interest in shaping the societal impact, deployment ethics, and governance of frontier models