LMArena

Machine Learning Scientist – Open Source Lead

LMArena

full-time

Posted on:

Location Type: Hybrid

Location: Bay Area • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

PythonPyTorchTensorflow

About the role

  • Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions
  • Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks
  • Analyze large-scale human voting and interaction data to uncover insights into model performance and user preferences
  • Communicate results with the broader research community via academic papers, educational content, conference talks
  • Collaborate with engineers to implement and scale research findings into production systems
  • Prototype and test research ideas rapidly, balancing rigor with iteration speed
  • Partner with model providers to shape evaluation questions and support responsible model testing
  • Contribute to the scientific integrity and transparency of the LMArena leaderboard and tools

Requirements

  • PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field
  • Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback)
  • Proficiency in Python and ML research libraries such as PyTorch, JAX, or TensorFlow
  • Demonstrated ability to design and analyze experiments with statistical rigor
  • Experience publishing research or working on open-source projects in ML, NLP, or AI evaluation
  • Comfortable working with real-world usage data and designing metrics beyond standard benchmarks
  • Ability to translate research questions into practical systems and collaborate across engineering and product teams
  • Passion for open science, reproducibility, and community-driven research.
Benefits
  • Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
Machine LearningNatural Language ProcessingStatisticsLLMsdeep learning architecturesTransformersdiffusion modelsreinforcement learningPythonML research libraries
Soft skills
communicationcollaborationanalytical thinkingproblem-solvingpassion for open sciencereproducibilitycommunity-driven research
Certifications
PhD