
Red Team Manager, Training, Quality, Roleplay Excellence
mpathic
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
About the role
- Train & Lead Red Team Reviewers
- Onboard new Red Team reviewers and run recurring calibration sessions to align on quality standards.
- Set expectations and maintain consistency across reviewers for evaluation depth, writing quality, and reproducibility.
- Build workflows for review (sampling, escalation, dispute resolution, feedback loops).
- Train Experts on Roleplays, Model Behavior & Harm
- Train red team experts on how to roleplay realistic user scenarios—including vulnerable users—without sensationalism.
- Teach systematic adversarial techniques (prompt escalation, persistence strategies, boundary probing).
- Help experts understand model failure modes: policy boundary drift, refusal weaknesses, hallucinations, unsafe compliance, and tone failures.
- Create Training Materials & Resources
- Build and maintain: Red team playbooks and rubrics Example libraries (“gold standard” roleplays + evaluations) Defect taxonomy (what counts as a meaningful finding vs noise) Brief modules for domain harm areas (self-harm, minors, extremism, medical, fraud, harassment, etc.) Write clear guidance that enables new hires to become productive quickly.
- Review & Evaluate Vulnerable User Roleplays
- Review vulnerable-user roleplays produced by experts for realism, safety relevance, and correct targeting of failure modes.
- Ensure roleplays are: behaviorally plausible ethically framed actionable for model improvement consistent with internal policies and customer expectations.
- Create Vulnerable User Roleplays
- Personally produce high-quality vulnerable-user roleplays, including: ambiguous edge cases multi-turn scenarios culturally nuanced or emotionally realistic interactions scenarios that stress safety, tone, and reliability.
- Review Hiring Applicants
- Own parts of the hiring loop for red team experts and reviewers: design work samples evaluate candidate submissions provide structured feedback and hiring recommendations. Help build a scalable standard for what “great” looks like in this role.
Requirements
- 4+ years in trust & safety, AI evaluation, red teaming, security testing, content integrity, or similar applied roles.
- Strong experience building training programs, rubrics, or QA frameworks for human judgment work.
- Ability to evaluate roleplays and adversarial scenarios with consistency and high signal-to-noise.
- Excellent written communication—clear, structured, and test-case oriented.
- Experience leading or mentoring teams in fast-moving environments.
- Experience red teaming LLMs, agentic systems, or tool-using models (prompt injection, data exfiltration, policy probing).
- Familiarity with evaluation methods: gold sets inter-rater reliability (or strong proxy measurement instincts) sampling strategies and quality gates.
- Background in one or more harm domains (self-harm, medical, violence, fraud, extremism, harassment).
- Experience scaling an operational team and improving productivity without quality loss.
Benefits
- Health insurance
- Professional development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
red teamingAI evaluationsecurity testingtraining programsQA frameworksevaluation methodsprompt injectiondata exfiltrationpolicy probingroleplay evaluation
Soft Skills
leadershipmentoringwritten communicationstructured feedbackteam buildingconsistencyorganizational skillsadaptabilitycritical thinkingcollaboration