FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesPython
About the role
Key responsibilities & impact- Evaluate & Benchmark: Run rigorous evaluations on frontier LLMs and autonomous agents across diverse tasks.
- Data Engineering: Create or modify benchmark data to test the reasoning and linguistic limits of modern AI.
- Experimental Research: Design and run experiments to identify "model-breaking points" and interpret the resulting data.
Requirements
What you’ll need- Currently enrolled at TU Berlin majoring in Computer Science (Bachelor/Master) or a related field
- Solid understanding of LLMs, natural language processing, or machine learning
- Highly proficient in Python, Bash, and git
- Appetite to quickly understand and incorporate new methodologies and models in a rapidly changing research landscape
- Strong drive to ship customer projects, sometimes on tight deadlines, to high quality
- Proficient in English
- Preferred: Proficient in one or more non-English languages
Benefits
Comp & perks- Work directly with models and teams from frontier labs
- Opportunity to publish papers in top-tier AI/ML conferences
- Contribute to industry-standard open-source benchmarks
- Competitive salary
- Hybrid environment with an on-site research team
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonBashgitnatural language processingmachine learningdata engineeringexperimental researchbenchmarkingLLMsmodel-breaking points
Soft Skills
drive to ship projectsability to work under tight deadlinesquick learneradaptability
