Salary
💰 $220,000 - $280,000 per year
Tech Stack
Distributed SystemsPythonPyTorchTensorflow
About the role
- Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created
- Design and implement scalable pipelines for automated evaluation of generative models, focusing on visual and multimodal outputs (image, video, text, audio)
- Develop novel metrics and evaluation models capturing fidelity, coherence, temporal consistency, and alignment with human intent
- Integrate evaluation signals into training loops (including reinforcement learning and reward modeling) to improve model performance
- Build infrastructure for large-scale regression testing, benchmarking, and monitoring of multimodal generative models
- Collaborate with researchers running human studies to translate human evaluation frameworks into automated or semi-automated systems
- Partner with model researchers to identify failure cases and build targeted evaluation harnesses
- Maintain dashboards, reporting tools, and alerting systems to surface evaluation results to stakeholders
- Stay current with emerging evaluation techniques in generative AI, multimodal LLMs, and perceptual quality assessment
Requirements
- Master's or PhD in Computer Science, Machine Learning, or a related technical field (or equivalent industry experience)
- 3+ years of experience building ML evaluation systems, model pipelines, or large-scale infrastructure
- Hands-on experience working with visual data (images and/or video), including evaluation, modeling, or data preparation
- Proficiency in Python and ML frameworks (PyTorch, JAX, or TensorFlow)
- Familiarity with human-in-the-loop evaluation workflows and how to scale them with automation
- Strong background in machine learning, with experience in generative models (diffusion, LLMs, multimodal architectures)
- Strong software engineering skills (CI/CD, testing, data pipelines, distributed systems)
- Nice to have: Experience with reinforcement learning or reward modeling
- Nice to have: Prior work on perceptual metrics, multimodal evaluation benchmarks, or retrieval-based evaluation
- Nice to have: Background in large-scale model training or evaluation infrastructure
- Nice to have: Experience designing metrics for perceptual quality
- Nice to have: Familiarity with creative media workflows (film, VFX, animation, digital art)
- Nice to have: Contributions to open-source evaluation libraries or benchmarks