mpathic

Red Team Manager, Training, Quality, Roleplay Excellence

mpathic

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Train & Lead Red Team Reviewers
  • Onboard new Red Team reviewers and run recurring calibration sessions to align on quality standards.
  • Set expectations and maintain consistency across reviewers for evaluation depth, writing quality, and reproducibility.
  • Build workflows for review (sampling, escalation, dispute resolution, feedback loops).
  • Train Experts on Roleplays, Model Behavior & Harm
  • Train red team experts on how to roleplay realistic user scenarios—including vulnerable users—without sensationalism.
  • Teach systematic adversarial techniques (prompt escalation, persistence strategies, boundary probing).
  • Help experts understand model failure modes: policy boundary drift, refusal weaknesses, hallucinations, unsafe compliance, and tone failures.
  • Create Training Materials & Resources
  • Build and maintain: Red team playbooks and rubrics Example libraries (“gold standard” roleplays + evaluations) Defect taxonomy (what counts as a meaningful finding vs noise) Brief modules for domain harm areas (self-harm, minors, extremism, medical, fraud, harassment, etc.) Write clear guidance that enables new hires to become productive quickly.
  • Review & Evaluate Vulnerable User Roleplays
  • Review vulnerable-user roleplays produced by experts for realism, safety relevance, and correct targeting of failure modes.
  • Ensure roleplays are: behaviorally plausible ethically framed actionable for model improvement consistent with internal policies and customer expectations.
  • Create Vulnerable User Roleplays
  • Personally produce high-quality vulnerable-user roleplays, including: ambiguous edge cases multi-turn scenarios culturally nuanced or emotionally realistic interactions scenarios that stress safety, tone, and reliability.
  • Review Hiring Applicants
  • Own parts of the hiring loop for red team experts and reviewers: design work samples evaluate candidate submissions provide structured feedback and hiring recommendations. Help build a scalable standard for what “great” looks like in this role.

Requirements

  • 4+ years in trust & safety, AI evaluation, red teaming, security testing, content integrity, or similar applied roles.
  • Strong experience building training programs, rubrics, or QA frameworks for human judgment work.
  • Ability to evaluate roleplays and adversarial scenarios with consistency and high signal-to-noise.
  • Excellent written communication—clear, structured, and test-case oriented.
  • Experience leading or mentoring teams in fast-moving environments.
  • Experience red teaming LLMs, agentic systems, or tool-using models (prompt injection, data exfiltration, policy probing).
  • Familiarity with evaluation methods: gold sets inter-rater reliability (or strong proxy measurement instincts) sampling strategies and quality gates.
  • Background in one or more harm domains (self-harm, medical, violence, fraud, extremism, harassment).
  • Experience scaling an operational team and improving productivity without quality loss.
Benefits
  • Health insurance
  • Professional development
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
red teamingAI evaluationsecurity testingtraining programsQA frameworksevaluation methodsprompt injectiondata exfiltrationpolicy probingroleplay evaluation
Soft Skills
leadershipmentoringwritten communicationstructured feedbackteam buildingconsistencyorganizational skillsadaptabilitycritical thinkingcollaboration