Arena

Machine Learning Scientist

Arena

full-time

Posted on:

Location Type: Hybrid

Location: Bay AreaCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions.
  • Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks.
  • Analyze large-scale human voting and interaction data to uncover insights into model performance and user preferences.
  • Collaborate with engineers to implement and scale research findings into production systems.
  • Prototype and test research ideas rapidly, balancing rigor with iteration speed.
  • Author internal reports and external publications that contribute to the broader ML research community.
  • Partner with model providers to shape evaluation questions and support responsible model testing.
  • Contribute to the scientific integrity and transparency of the LMArena leaderboard and tools.

Requirements

  • Hands-on experience training large-scale models, including reward models, preference models, and fine-tuning LLMs with methods like RLHF, DPO, and contrastive learning.
  • Strong foundation in ML and statistics, with a track record of designing novel training objectives, evaluation schemes, or statistical frameworks to improve model reliability and alignment.
  • Fluent in the full experimental stack, from dataset design and large-batch training to rigorous evaluation and ablation, with an eye for what scales to production.
  • Deeply collaborative mindset, working closely with engineers to productionize research insights and iterating with product teams to align modeling goals with user needs.
  • PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field.
  • Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback).
  • Proficiency in Python and ML research libraries such as PyTorch, JAX, or TensorFlow.
  • Demonstrated ability to design and analyze experiments with statistical rigor.
  • Experience publishing research or working on open-source projects in ML, NLP, or AI evaluation.
  • Comfortable working with real-world usage data and designing metrics beyond standard benchmarks.
  • Ability to translate research questions into practical systems and collaborate across engineering and product teams.
  • Passion for open science, reproducibility, and community-driven research.
Benefits
  • Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.
  • Competitive compensation and equity aligned to the markets where our team members are based.
  • The opportunity to work on cutting-edge AI with a small, mission-driven team.
  • A culture that values transparency, trust, and community impact.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
training large-scale modelsreward modelspreference modelsfine-tuning LLMsRLHFDPOcontrastive learningexperimental designstatistical frameworksdeep learning architectures
Soft Skills
collaborative mindsetiterative developmentcommunicationproblem-solvingpassion for open sciencetranslating research questionsworking closely with engineersaligning modeling goalscommunity-driven researchdesigning metrics
Certifications
PhD in Machine LearningPhD in Natural Language ProcessingPhD in Statistics