Machine Learning Scientist - Open Source Lead

LMArena

full-time

Posted on: 8/20/2025

Location: California • 🇺🇸 United States

✨ AI Apply

Senior

PythonPyTorchTensorflow

About the role

Lead open-source ML research including open data set and code releases to advance AI model evaluation in the open.
Design, run, and share new methods and experiments revealing what makes models useful, trustworthy, and capable, grounded in human preference signals and released openly for the ecosystem and research community to build upon.
Take commitment to openness from principle to practice; curate high-impact datasets; develop new methodology and reproducible benchmarks; release code enabling the research ecosystem to push AI evaluations forward.
Shape the public leaderboard, power community tools, and strengthen transparency in AI evaluation worldwide.
Work interdisciplinary with engineers, product teams, marketing, and the broader research community to advance how we compare models, analyze preference data, and understand factors like style, reasoning, and robustness.
Collaborate with GTM teams as spokesperson for outreach for our open research efforts: strengthening partnerships, expanding research community participation, and championing programs that grow and support our research network.
If you’re excited by open-ended questions, rigorous evaluation, and scientific communication and outreach, you’ll find a meaningful home here.

PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field
Uses personal and professional platforms to amplify open research initiatives and invite collaboration.
Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback)
Proficiency in Python and ML research libraries such as PyTorch, JAX, or TensorFlow
Demonstrated ability to design and analyze experiments with statistical rigor
Experience publishing research or working on open-source projects in ML, NLP, or AI evaluation
Comfortable working with real-world usage data and designing metrics beyond standard benchmarks
Ability to translate research questions into practical systems and collaborate across engineering and product teams
Passion for open science, reproducibility, and community-driven research