Welocalize

Arabic (Levantine) AI Evaluation Specialist

Welocalize

full-time

Posted on:

Location Type: Remote

Location: Egypt

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $10 per hour

About the role

  • Design scenario-based and edge-case prompts to test AI behavior, including trick and incomplete-information cases.
  • Develop evaluation rubrics to assess AI responses across instruction-following, factuality, tone, safety, refusals, and helpfulness.
  • Perform side-by-side evaluations of AI outputs and score them on a 1–5 scale using defined criteria.
  • Create high-quality source documents (articles, transcripts, reports) as the single source of truth for testing.
  • Write accurate and well-structured Golden Responses that correctly follow instructions and handle ambiguity.

Requirements

  • Bachelor's degree or equivalent experience in Linguistics, Computational Linguistics, Communications, Technical Writing, or a related analytical field.
  • B2 or superior level of English.
  • Native fluency in Modern Standard Arabic in Levantine dialect.
  • Strong understanding of the distinction between Fusha and ‘Ammiyya
  • Proven experience in a role involving AI data annotation, content quality review, search quality rating, or prompt engineering.
  • Ability to work independently and manage workflows effectively in a remote environment.
  • Multilingual proficiency in one or more Arabic dialects is a plus.
  • Strong attention to detail and critical thinking to identify hallucinations and bias.
  • Familiarity with data annotation platforms and model evaluation tools is a plus.
  • Cultural familiarity with regional norms and high-context communication styles, particularly in the GCC region.
Benefits
  • Limitless Flexibility
  • Limitless Growth
  • Limitless Support
  • Real Impact

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AI data annotationcontent quality reviewsearch quality ratingprompt engineeringevaluation rubricsGolden Responsesscenario-based testingedge-case testingcritical thinkingattention to detail
Soft skills
independent workworkflow managementcommunicationcultural familiarityhigh-context communication