
AI QA Trainer – LLM Evaluation
Invisible Technologies
contract
Posted on:
Location Type: Remote
Location: Anywhere in the World
Visit company websiteExplore more
Salary
💰 $6 - $65 per hour
About the role
- Converse with the model on real-world scenarios and evaluation prompts
- Verify factual accuracy and logical soundness
- Design and run test plans and regression suites
- Build clear rubrics and pass/fail criteria
- Capture reproducible error traces with root-cause hypotheses
- Suggest improvements to prompt engineering, guardrails, and evaluation metrics (e.g., precision/recall, faithfulness, toxicity, and latency SLOs)
- Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time
Requirements
- Bachelor’s, master’s, or PhD in computer science, data science, computational linguistics, statistics, or a related field is ideal
- Shipped QA for ML/AI systems
- Safety/red-team experience
- Test automation frameworks (e.g., PyTest)
- Hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)
- Skills that stand out include: evaluation rubric design, adversarial testing/red-teaming, regression testing at scale, bias/fairness auditing, grounding verification, prompt and system-prompt engineering, test automation (Python/SQL), and high-signal bug reporting
- Clear, metacognitive communication—“showing your work”—is essential.
Benefits
- Company-sponsored benefits such as health insurance do not apply
- You’ll supply a secure computer and high-speed internet
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonSQLtest automationevaluation rubric designadversarial testingregression testingbias auditinggrounding verificationprompt engineeringhigh-signal bug reporting
Soft Skills
metacognitive communicationcollaborationcritical thinkingproblem-solving