Training Infrastructure Engineer

Mirelo AI

Training Infrastructure Engineer at Mirelo AI focusing on optimizing GPU performance and training pipelines. Work includes design and maintenance of infrastructure for efficient model training.

Posted 4/12/2026full-timeBerlin • 🇩🇪 GermanyMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

PyTorch

About the role

Key responsibilities & impact

Focus on the full training stack - profiling GPU behavior, debugging training pipelines
Improve throughput, choosing the right parallelism strategies
Design the infrastructure for efficient model training at scale
Work across cluster management, model training, efficient data pipelines, inference and optimizing PyTorch code

Requirements

What you’ll need

Familiarity with the latest and most effective techniques in optimizing training and inference workloads—not from reading papers, but from implementing them
Deep understanding of GPU memory hierarchy and computation capabilities
Experience optimizing for both memory-bound and compute-bound operations
Expertise with efficient attention algorithms and their performance characteristics at different scales
Nice to Have: Experience in implementing custom GPU kernels and integrating them into PyTorch
Familiarity with high-performance storage solutions and understanding of their performance characteristics for ML workloads
Experience with managing SLURM clusters at scale

Benefits

Comp & perks

Competitive compensation and equity
True ownership from day one
Join at a pivotal moment
Build for the next generation of creators

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

GPU profilingdebugging training pipelinesparallelism strategiesmodel training infrastructuredata pipelinesPyTorch optimizationmemory-bound operationscompute-bound operationsefficient attention algorithmscustom GPU kernels