
Machine Learning Scientist – Open Source Lead
LMArena
full-time
Posted on:
Location Type: Hybrid
Location: Bay Area • California • 🇺🇸 United States
Visit company websiteJob Level
Senior
Tech Stack
PythonPyTorchTensorflow
About the role
- Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions
- Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks
- Analyze large-scale human voting and interaction data to uncover insights into model performance and user preferences
- Communicate results with the broader research community via academic papers, educational content, conference talks
- Collaborate with engineers to implement and scale research findings into production systems
- Prototype and test research ideas rapidly, balancing rigor with iteration speed
- Partner with model providers to shape evaluation questions and support responsible model testing
- Contribute to the scientific integrity and transparency of the LMArena leaderboard and tools
Requirements
- PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field
- Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback)
- Proficiency in Python and ML research libraries such as PyTorch, JAX, or TensorFlow
- Demonstrated ability to design and analyze experiments with statistical rigor
- Experience publishing research or working on open-source projects in ML, NLP, or AI evaluation
- Comfortable working with real-world usage data and designing metrics beyond standard benchmarks
- Ability to translate research questions into practical systems and collaborate across engineering and product teams
- Passion for open science, reproducibility, and community-driven research.
Benefits
- Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Machine LearningNatural Language ProcessingStatisticsLLMsdeep learning architecturesTransformersdiffusion modelsreinforcement learningPythonML research libraries
Soft skills
communicationcollaborationanalytical thinkingproblem-solvingpassion for open sciencereproducibilitycommunity-driven research
Certifications
PhD