Data Scientist – AI Quality, Evaluation

. Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale .

Posted 3/25/2026full-timeBoston • Massachusetts • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tools & technologies

PythonPyTorch

Key responsibilities & impact

Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale
Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy
Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases
Implement feedback loops that continuously improve model outputs based on validation signals
Establish scalable quality gates that catch errors before they reach end users
Contribute to model alignment and fine-tuning efforts

What you’ll need

Strong foundation in deep learning frameworks (PyTorch) and LLM architectures
Experience with model evaluation, benchmarking, and quality metrics
Proficiency in Python and modern ML development tools
Strong statistical foundations
Ability to read, implement, and extend research papers
Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)
Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)
Experience with RLHF, DPO, or preference optimization techniques
Background in healthcare AI or regulated industries
Experience building evaluation systems for production LLM applications

Comp & perks

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

deep learningPyTorchLLM architecturesmodel evaluationbenchmarkingquality metricsPythonstatistical foundationsRLHFDPO

Soft Skills

ability to read research papersimplementation skillsextension of research

Certifications

Master's degree in Computer ScienceMaster's degree in Machine LearningMaster's degree in StatisticsPhD in related quantitative field