Senior Performance Engineer – Pretraining

Aleph Alpha

full-time

Posted on: 2/26/2026

Location Type: Hybrid

Location: Heidelberg • Germany

✨ AI Apply

About the role

Engineer the systems required to train foundation models at scale.
Maximize hardware utilization and training throughput on our large-scale GPU clusters.
Work at the intersection of deep learning frameworks, distributed systems, and GPU microarchitecture.

Are proficient in Python and the PyTorch library.
Have a strong engineering background in parallel and/or distributed systems with proven track record of excellence.
Have hands-on experience with modern machine learning techniques (especially large language models and their life cycle).
Deeply understand the CUDA programming model.
Have experience in distributed programming with APIs like NCCL or MPI.
Have experience analysing profiling traces with tools such as PyTorch Profiler and Nvidia Nsight.
Please note this role requires regular on-site collaboration in Heidelberg as a member of the Training Efficiency Team.

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonPyTorchCUDANCCLMPImachine learninglarge language modelsparallel systemsdistributed systemsprofiling

Soft Skills

engineering backgroundcollaborationexcellence