FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Benchmark Engineer – Native Language Specialist, Chinese Mandarin
LILT AIAI Benchmark Engineer creating benchmarks for large language models in multilingual contexts. Collaborating globally to enhance AI's capability in diverse language environments.
Tech Stack
Tools & technologiesPythonShell Scripting
About the role
Key responsibilities & impact- Design, build, and validate benchmarks for Terminal-Bench tasks.
- Create high-signal, high-quality tasks in your native language.
- Find failure points in AI processing in your native language.
- Support the development of robust solutions and write reliable verifier scripts.
- Analyze execution logs and calibrate task difficulty.
- Participate in a 4-layer quality control process alongside automated checks.
Requirements
What you’ll need- 5+ years of industry experience in software engineering.
- Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
- Native or near-native fluency, with a deep understanding of its grammar, register, and phrasing rules.
- High English proficiency.
- Strong proficiency in Python, standard shell scripting, and data processing.
- Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
- Deep technical understanding of multilingual text processing pitfalls, including unicode normalization.
Benefits
Comp & perks- Your schedule, your rules.
- Get paid quickly and fairly.
- Work on projects that actually matter.
- Be part of something bigger.
- Grow without limits.
- Have fun doing what you love.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Pythonshell scriptingdata processingmultilingual text processingunicode normalizationbenchmark designtask validationverifier scriptsexecution log analysistask difficulty calibration
Soft Skills
leadershipcommunicationanalytical thinkingproblem-solvingattention to detail