Full-stack Engineer

• As an AI Software Engineer in Model Evaluation, you will help design, implement, and scale the systems that measure our models’ performance at the cutting edge.
• You will work closely with researchers to create evaluation benchmarks, datasets, and environments that test model capabilities, safety, and reliability across tasks from multilingual understanding to mathematical reasoning and creativity.
• You will own significant portions of our evaluation infrastructure, including dataset generation pipelines, automated benchmarking tools, analysis dashboards, and large-scale evaluation orchestration on our compute clusters.
• You’ll be building tools and experiments that drive product decisions, shape research priorities, and guide responsible deployment of our models.
• This is high-scale, high-impact engineering: you’ll work with petabyte-scale data, run evaluations across large-scale distributed GPU clusters, and deliver insights that inform the direction of Aleph Alpha’s research.

AI Software Engineer – Model Evaluation

Senior Fullstack Engineer – m/f/d