
Data Scientist – AI Quality, Evaluation
Bioscope AI
full-time
Posted on:
Location Type: Hybrid
Location: Boston • Massachusetts • United States
Visit company websiteExplore more
About the role
- Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale
- Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy
- Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases
- Implement feedback loops that continuously improve model outputs based on validation signals
- Establish scalable quality gates that catch errors before they reach end users
- Contribute to model alignment and fine-tuning efforts
Requirements
- Strong foundation in deep learning frameworks (PyTorch) and LLM architectures
- Experience with model evaluation, benchmarking, and quality metrics
- Proficiency in Python and modern ML development tools
- Strong statistical foundations
- Ability to read, implement, and extend research papers
- Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)
- Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)
- Experience with RLHF, DPO, or preference optimization techniques
- Background in healthcare AI or regulated industries
- Experience building evaluation systems for production LLM applications
Benefits
- Health insurance
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
deep learningPyTorchLLM architecturesmodel evaluationbenchmarkingquality metricsPythonstatistical foundationsRLHFDPO
Soft Skills
ability to read research papersimplementation skillsextension of research
Certifications
Master's degree in Computer ScienceMaster's degree in Machine LearningMaster's degree in StatisticsPhD in related quantitative field