Red Team Manager, Training, Quality, Roleplay Excellence

mpathic

full-time

Posted on: 1/25/2026

Location Type: Remote

Location: United States

Visit company website

Explore more

Manager jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

About the role

Train & Lead Red Team Reviewers
Onboard new Red Team reviewers and run recurring calibration sessions to align on quality standards.
Set expectations and maintain consistency across reviewers for evaluation depth, writing quality, and reproducibility.
Build workflows for review (sampling, escalation, dispute resolution, feedback loops).
Train Experts on Roleplays, Model Behavior & Harm
Train red team experts on how to roleplay realistic user scenarios—including vulnerable users—without sensationalism.
Teach systematic adversarial techniques (prompt escalation, persistence strategies, boundary probing).
Help experts understand model failure modes: policy boundary drift, refusal weaknesses, hallucinations, unsafe compliance, and tone failures.
Create Training Materials & Resources
Build and maintain: Red team playbooks and rubrics Example libraries (“gold standard” roleplays + evaluations) Defect taxonomy (what counts as a meaningful finding vs noise) Brief modules for domain harm areas (self-harm, minors, extremism, medical, fraud, harassment, etc.) Write clear guidance that enables new hires to become productive quickly.
Review & Evaluate Vulnerable User Roleplays
Review vulnerable-user roleplays produced by experts for realism, safety relevance, and correct targeting of failure modes.
Ensure roleplays are: behaviorally plausible ethically framed actionable for model improvement consistent with internal policies and customer expectations.
Create Vulnerable User Roleplays
Personally produce high-quality vulnerable-user roleplays, including: ambiguous edge cases multi-turn scenarios culturally nuanced or emotionally realistic interactions scenarios that stress safety, tone, and reliability.
Review Hiring Applicants
Own parts of the hiring loop for red team experts and reviewers: design work samples evaluate candidate submissions provide structured feedback and hiring recommendations. Help build a scalable standard for what “great” looks like in this role.

Requirements

4+ years in trust & safety, AI evaluation, red teaming, security testing, content integrity, or similar applied roles.
Strong experience building training programs, rubrics, or QA frameworks for human judgment work.
Ability to evaluate roleplays and adversarial scenarios with consistency and high signal-to-noise.
Excellent written communication—clear, structured, and test-case oriented.
Experience leading or mentoring teams in fast-moving environments.
Experience red teaming LLMs, agentic systems, or tool-using models (prompt injection, data exfiltration, policy probing).
Familiarity with evaluation methods: gold sets inter-rater reliability (or strong proxy measurement instincts) sampling strategies and quality gates.
Background in one or more harm domains (self-harm, medical, violence, fraud, extremism, harassment).
Experience scaling an operational team and improving productivity without quality loss.

Benefits

Health insurance
Professional development

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

red teamingAI evaluationsecurity testingtraining programsQA frameworksevaluation methodsprompt injectiondata exfiltrationpolicy probingroleplay evaluation

Soft Skills

leadershipmentoringwritten communicationstructured feedbackteam buildingconsistencyorganizational skillsadaptabilitycritical thinkingcollaboration