Member of Technical Staff – Evals

Magic

Member of Technical Staff on Evals developing evaluation systems for AI models at Magic. Building trustworthy evaluations that inform research and product decisions while providing critical infrastructure.

Posted 6/27/2026full-timeSan Francisco • California • 🇺🇸 United StatesLead💰 $200,000 - $550,000 per yearWebsite

About the role

Key responsibilities & impact

Build and maintain the internal evals platform used across Magic
Design, implement, and validate eval tasks for pre-training, post-training, reinforcement learning, inference, and product systems
Develop infrastructure for running large-scale evaluations
Build systems to measure dataset quality and identify opportunities to improve training data
Improve evaluation correctness, reproducibility, and reliability
Audit and improve upon public benchmarks, evaluation methodologies, and open-source implementations
Partner with research, data, inference, and product teams to define metrics that accurately reflect model quality
Build tooling and frameworks that enable teams across Magic to make decisions based on trustworthy measurements

Requirements

What you’ll need

Experience building production systems, internal platforms, or developer infrastructure
Experience working with machine learning systems, evaluation frameworks, data infrastructure, or research tooling
Track record of owning technical projects end-to-end
Skepticism toward results that cannot be reproduced, validated, or explained
Ability to reason critically about benchmarks, metrics, and experimental methodology
Experience designing, implementing, or operating systems that run at scale
Comfortable navigating ambiguity and determining whether a measurement is actually capturing the behavior it claims to measure
Excitement about helping researchers and engineers make better decisions through trustworthy measurements

Benefits

Comp & perks

Equity is a significant part of total compensation, in addition to salary
401(k) plan with 6% salary matching
Generous health, dental, and vision insurance for you and your dependents
Unlimited paid time off
Visa sponsorship and relocation support for candidates moving to San Francisco
A small, fast-moving, highly collaborative team working on frontier AI systems

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

machine learning systemsevaluation frameworksdata infrastructureproduction systemsinternal platformsdeveloper infrastructureevaluation methodologiesopen-source implementationslarge-scale evaluationsexperimental methodology

Soft Skills

critical reasoningskepticismdecision-makingnavigating ambiguityproject ownershipcollaborationtrustworthinessmeasurement accuracyproblem-solvingcommunication