Senior Applied AI Scientist

Senior Applied AI Scientist evaluating and optimizing AI systems for Ro healthcare. Designing frameworks for LLM-powered products across patient experience and operations.

Posted 6/27/2026full-timeNew York City • New York • 🇺🇸 United StatesSenior💰 $182,300 - $220,000 per yearWebsite

Tech Stack

Tools & technologies

PythonSQL

About the role

Key responsibilities & impact

Design and own evaluation frameworks for production LLM features, including LLM-as-a-judge evaluations, regression suites, synthetic datasets, golden datasets, and human review workflows.
Analyze production behavior to identify quality issues, hallucinations, latency bottlenecks, cost regressions, and emerging failure modes.
Design and run experiments including prompt variations, workflow changes, retrieval improvements, and model comparisons; and quantify their impact on quality, operational metrics, and user outcomes.
Define the metrics that matter and build dashboards that make AI performance visible across the organization.
Partner with engineering to determine which optimizations should be productionized and how to measure ongoing success.
Mentor teammates on experimental design, statistical rigor, evaluation methodology, and measurement best practices.

Requirements

What you’ll need

5+ years of experience in data science, applied machine learning, experimentation, or a closely related field, with at least the last year focused on applied LLMs or AI evaluation.
Strong Python and SQL skills with experience working on production data pipelines and experimentation.
You have experience designing reproducible evaluation frameworks rather than relying on manual spot checks or qualitative assessments.
You have strong statistical intuition: you think in terms of distributions, confidence intervals, variance, and sample sizes rather than anecdotes.
You’re comfortable working closely with engineers and product teams to translate experimental findings into production improvements
Bonus: Experience with evaluation platforms (e.g. Braintrust, LangSmith, OpenAI Evals), experimentation platforms, causal inference, healthcare, or operations-heavy environments.

Benefits

Comp & perks

Competitive equity and benefits package

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonSQLdata scienceapplied machine learningexperimental designstatistical rigorevaluation methodologyproduction data pipelinesevaluation frameworkscausal inference

Soft Skills

mentoringcollaborationcommunicationanalytical thinkingproblem-solving