Invisible Technologies

AI QA Trainer – LLM Evaluation

Invisible Technologies

contract

Posted on:

Location Type: Remote

Location: Anywhere in the World

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $6 - $65 per hour

Tech Stack

About the role

  • Converse with the model on real-world scenarios and evaluation prompts
  • Verify factual accuracy and logical soundness
  • Design and run test plans and regression suites
  • Build clear rubrics and pass/fail criteria
  • Capture reproducible error traces with root-cause hypotheses
  • Suggest improvements to prompt engineering, guardrails, and evaluation metrics (e.g., precision/recall, faithfulness, toxicity, and latency SLOs)
  • Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time

Requirements

  • Bachelor’s, master’s, or PhD in computer science, data science, computational linguistics, statistics, or a related field is ideal
  • Shipped QA for ML/AI systems
  • Safety/red-team experience
  • Test automation frameworks (e.g., PyTest)
  • Hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)
  • Skills that stand out include: evaluation rubric design, adversarial testing/red-teaming, regression testing at scale, bias/fairness auditing, grounding verification, prompt and system-prompt engineering, test automation (Python/SQL), and high-signal bug reporting
  • Clear, metacognitive communication—“showing your work”—is essential.
Benefits
  • Company-sponsored benefits such as health insurance do not apply
  • You’ll supply a secure computer and high-speed internet
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonSQLtest automationevaluation rubric designadversarial testingregression testingbias auditinggrounding verificationprompt engineeringhigh-signal bug reporting
Soft Skills
metacognitive communicationcollaborationcritical thinkingproblem-solving