
Arabic (Egyptian) AI Evaluation Specialist
Welocalize
contract
Posted on:
Location Type: Remote
Location: Egypt
Visit company websiteExplore more
Salary
💰 $10 per hour
About the role
- Support the testing and evaluation of an Arabic language model.
- Design prompts, evaluate responses based on functionality, accuracy, and safety of AI systems.
- Generate the best possible answers for the target audience.
- Design scenario-based and edge-case prompts to test AI behavior.
- Develop evaluation rubrics to assess AI responses across various criteria.
- Perform side-by-side evaluations of AI outputs and score them on a defined scale.
- Create high-quality source documents as the single source of truth for testing.
- Write accurate Golden Responses that handle instructions and ambiguity.
Requirements
- Bachelor's degree or equivalent experience in Linguistics, Computational Linguistics, Communications, Technical Writing, or a related analytical field.
- B2 or superior level of English.
- Native fluency in Modern Standard Arabic in Egyptian dialect.
- Strong understanding of the distinction between Fusha and ‘Ammiyya
- Proven experience in a role involving AI data annotation, content quality review, search quality rating, or prompt engineering.
- Ability to work independently and manage workflows effectively in a remote environment.
- Multilingual proficiency in one or more Arabic dialects (nice to have).
- Strong attention to detail and critical thinking to identify hallucinations and bias (nice to have).
- Familiarity with data annotation platforms and model evaluation tools (nice to have).
- Experience in prompt engineering, AI evaluation, linguistic QA, or translation (nice to have).
- Cultural familiarity with regional norms and high-context communication styles, particularly in the GCC region (nice to have).
Benefits
- Limitless Flexibility
- Limitless Growth
- Limitless Support
- Real Impact
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Arabic language model evaluationAI systems functionality assessmentprompt engineeringevaluation rubrics developmentAI output scoringGolden Responses writingdata annotationcontent quality reviewsearch quality ratinglinguistic QA
Soft skills
attention to detailcritical thinkingindependent workworkflow management