
Senior Data Scientist – LLM Evaluation
EvolutionIQ
full-time
Posted on:
Location Type: Remote
Location: New York • United States
Visit company websiteExplore more
Salary
💰 $200,000 - $240,000 per year
Job Level
Tech Stack
About the role
- Design and implement comprehensive scorecards and benchmarking suites for LLM-based extraction, summarization, and chat interfaces
- Act as the technical lead in working with Subject Matter Experts to codify their expertise into evaluation datasets and "ground truth" labels
- Design the statistical guardrails to scale both our human and automated labeling efforts
- Provide clear, data-driven "Go/No-Go" recommendations for model deployment
Requirements
- 5+ years of experience in Data Science with a strong background in traditional statistics
- 2+ years of focused experience working with LLMs, specifically in evaluation, benchmarking, and prompt auditing
- Master’s or PhD in Statistics, Mathematics, or a related quantitative field
- Proficient in Python (Pandas, Scikit-learn, Statsmodels) and SQL
- Familiarity with LLM evaluation frameworks is a major plus
- Proven ability to work with non-technical SMEs to translate their qualitative feedback into quantitative metrics
Benefits
- Health insurance
- 401k matching
- Paid time off
- 100% paid parental leave
- Flexible schedule for new parents returning to work
- $1,000/year for each employee for professional development
- Tuition reimbursement
- Catered lunches
- Happy hours
- Pet-friendly spaces
- Monthly technology stipend
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Data Sciencetraditional statisticsLLMsevaluationbenchmarkingprompt auditingPythonPandasScikit-learnSQL
Soft skills
technical leadcommunicationcollaborationdata-driven decision makingtranslating qualitative feedback
Certifications
Master’s in StatisticsPhD in StatisticsPhD in Mathematics