Data Scientist, Integrity Measurement

OpenAI

full-time

Posted on: 2/25/2026

Location Type: Hybrid

Location: London • United Kingdom

Visit company website

Explore more

Data Scientist jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Python SQL

About the role

own measurement and quantitative analysis for a group of severe, actor- and network-based usage harm verticals
develop and implement AI-first methods for prevalence measurement and other productionised safety metrics, which may necessarily include off-platform indicators or other non-standard datasets
build metrics that can be used for goaling or A/B tests when prevalence or other top line metrics are not suitable
own dashboards and metrics reporting for harm verticals
conduct analyses and generate insights that inform improvements to review, detection, or enforcement, and that influence roadmaps
optimise LLM prompts for the purpose of measurement
collaborate w/ other safety teams to understand key safety concerns and create relevant policies that will support safety needs
provide metrics for leadership and external reporting
develop automation to scale yourself, leveraging our agentic products

Requirements

are a senior DS with trust and safety experience that can drive measurement direction
have deep statistics skills, specifically around sampling methods and prevalence estimation of complicated problem areas (ideally activity- rather than content-based)
have experience working with severe and sensitive harm areas like child safety or violence
are capable in data programming languages (R or python, SQL)
(ideally) have experience with AI harms or leveraging AI for measurement

Benefits

We are committed to providing reasonable accommodations to applicants with disabilities
Background checks for applicants will be administered in accordance with applicable law
Equal opportunity employer

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

measurement and quantitative analysisAI-first methodsprevalence measurementA/B testingdashboard reportingLLM prompt optimisationdata programming languagesstatisticssampling methodsprevalence estimation

Soft Skills

collaborationinsight generationcommunicationleadership