Luma AI

Research Engineer - Evaluations

Luma AI

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Apply

Salary

💰 $220,000 - $280,000 per year

Job Level

Mid-LevelSenior

Tech Stack

Distributed SystemsPythonPyTorchTensorflow

About the role

  • Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created
  • Design and implement scalable pipelines for automated evaluation of generative models, focusing on visual and multimodal outputs (image, video, text, audio)
  • Develop novel metrics and evaluation models capturing fidelity, coherence, temporal consistency, and alignment with human intent
  • Integrate evaluation signals into training loops (including reinforcement learning and reward modeling) to improve model performance
  • Build infrastructure for large-scale regression testing, benchmarking, and monitoring of multimodal generative models
  • Collaborate with researchers running human studies to translate human evaluation frameworks into automated or semi-automated systems
  • Partner with model researchers to identify failure cases and build targeted evaluation harnesses
  • Maintain dashboards, reporting tools, and alerting systems to surface evaluation results to stakeholders
  • Stay current with emerging evaluation techniques in generative AI, multimodal LLMs, and perceptual quality assessment

Requirements

  • Master's or PhD in Computer Science, Machine Learning, or a related technical field (or equivalent industry experience)
  • 3+ years of experience building ML evaluation systems, model pipelines, or large-scale infrastructure
  • Hands-on experience working with visual data (images and/or video), including evaluation, modeling, or data preparation
  • Proficiency in Python and ML frameworks (PyTorch, JAX, or TensorFlow)
  • Familiarity with human-in-the-loop evaluation workflows and how to scale them with automation
  • Strong background in machine learning, with experience in generative models (diffusion, LLMs, multimodal architectures)
  • Strong software engineering skills (CI/CD, testing, data pipelines, distributed systems)
  • Nice to have: Experience with reinforcement learning or reward modeling
  • Nice to have: Prior work on perceptual metrics, multimodal evaluation benchmarks, or retrieval-based evaluation
  • Nice to have: Background in large-scale model training or evaluation infrastructure
  • Nice to have: Experience designing metrics for perceptual quality
  • Nice to have: Familiarity with creative media workflows (film, VFX, animation, digital art)
  • Nice to have: Contributions to open-source evaluation libraries or benchmarks