Werkstudent – AI Research, Data Evaluation

LILT

Evaluate LLMs and enhance AI research for multilingual tasks at LILT. Contribute to experimental research while gaining hands-on experience working with leading models and teams.

Posted 5/14/2026part-timeBerlin • 🇩🇪 GermanyEntry LevelWebsite

Tech Stack

Tools & technologies

Python

About the role

Key responsibilities & impact

Evaluate & Benchmark: Run rigorous evaluations on frontier LLMs and autonomous agents across diverse tasks.
Data Engineering: Create or modify benchmark data to test the reasoning and linguistic limits of modern AI.
Experimental Research: Design and run experiments to identify "model-breaking points" and interpret the resulting data.

Requirements

What you’ll need

Currently enrolled at TU Berlin majoring in Computer Science (Bachelor/Master) or a related field
Solid understanding of LLMs, natural language processing, or machine learning
Highly proficient in Python, Bash, and git
Appetite to quickly understand and incorporate new methodologies and models in a rapidly changing research landscape
Strong drive to ship customer projects, sometimes on tight deadlines, to high quality
Proficient in English
Preferred: Proficient in one or more non-English languages

Benefits

Comp & perks

Work directly with models and teams from frontier labs
Opportunity to publish papers in top-tier AI/ML conferences
Contribute to industry-standard open-source benchmarks
Competitive salary
Hybrid environment with an on-site research team

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonBashgitnatural language processingmachine learningdata engineeringexperimental researchbenchmarkingLLMsmodel-breaking points

Soft Skills

drive to ship projectsability to work under tight deadlinesquick learneradaptability