Mirelo AI

Training Infrastructure Engineer

Mirelo AI

full-time

Posted on:

Location Type: Hybrid

Location: BerlinGermany

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Focus on the full training stack - profiling GPU behavior, debugging training pipelines
  • Improve throughput, choosing the right parallelism strategies
  • Design the infrastructure for efficient model training at scale
  • Work across cluster management, model training, efficient data pipelines, inference and optimizing PyTorch code

Requirements

  • Familiarity with the latest and most effective techniques in optimizing training and inference workloads—not from reading papers, but from implementing them
  • Deep understanding of GPU memory hierarchy and computation capabilities
  • Experience optimizing for both memory-bound and compute-bound operations
  • Expertise with efficient attention algorithms and their performance characteristics at different scales
  • Nice to Have: Experience in implementing custom GPU kernels and integrating them into PyTorch
  • Familiarity with high-performance storage solutions and understanding of their performance characteristics for ML workloads
  • Experience with managing SLURM clusters at scale
Benefits
  • Competitive compensation and equity
  • True ownership from day one
  • Join at a pivotal moment
  • Build for the next generation of creators
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GPU profilingdebugging training pipelinesparallelism strategiesmodel training infrastructuredata pipelinesPyTorch optimizationmemory-bound operationscompute-bound operationsefficient attention algorithmscustom GPU kernels