Design, test, and iteratively refine complex and creative prompts to enhance AI model capabilities in reasoning, instruction following, and contextual understanding.
Conduct rigorous side-by-side (SxS) comparisons of AI-generated outputs, providing detailed rationales and quality ratings to identify the superior response.
Develop 'golden' datasets, ideal responses, and granular evaluation rubrics to serve as benchmarks for model training and performance analysis.
Engineer adversarial prompts and red-team scenarios to systematically identify model vulnerabilities, biases, and safety gaps across various policies (e.g., Harassment, Hate Speech, Dangerous Content).
Create, annotate, and review diverse datasets across text, audio, and video formats to support model training and localization.
Perform in-depth fact-checking and analysis to ensure model responses are accurate, relevant, and grounded in reliable sources.
Establish and document style guides, content standards, and evaluation procedures to ensure consistency and quality across all projects.
Analyze model outputs to identify trends, document error patterns, and categorize failures, providing actionable feedback to engineering teams.
Train, mentor, and guide other team members on prompt engineering best practices, evaluation methodologies, and quality standards.
Collaborate with engineering and product teams to translate evaluation insights into actionable model improvements.
For voice projects, script and perform dialogues portraying various personas (e.g., customer, agent) and accents to generate realistic conversational data for AI voice agents.
For video projects, author precise text prompts using professional video production terminology (e.g., shot angles, camera movements, lighting) to guide generative video models.
Requirements
Bachelor's degree or equivalent experience in Linguistics, Computational Linguistics, Communications, Technical Writing, or a related analytical field.
Native or near-native fluency in English with exceptional writing and editorial skills.
Proven experience in a role involving AI data annotation, content quality review, search quality rating, or prompt engineering.
A highly detail-oriented and analytical mindset, with the ability to deconstruct complex instructions and evaluate outputs with precision.
Ability to interpret code, datasets, and system workflows at a conceptual level (no coding required).
Ability to work independently and manage workflows effectively in a remote environment.
Multilingual proficiency in one or more languages in addition to English (Nice to Have)
Direct experience with generative AI tools for text, voice, or video (Nice to Have)
Background in QA testing, rubric design, or AI safety and ethics evaluation (Nice to Have)
Familiarity with data annotation platforms and model evaluation tools (Nice to Have)
Benefits
Professional development opportunities
Work remotely
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AI data annotationcontent quality reviewsearch quality ratingprompt engineeringfact-checkingdata analysisrubric designmodel evaluationadversarial prompt engineeringlocalization