FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Training Infrastructure Engineer
Mirelo AITraining Infrastructure Engineer at Mirelo AI focusing on optimizing GPU performance and training pipelines. Work includes design and maintenance of infrastructure for efficient model training.
Tech Stack
Tools & technologiesPyTorch
About the role
Key responsibilities & impact- Focus on the full training stack - profiling GPU behavior, debugging training pipelines
- Improve throughput, choosing the right parallelism strategies
- Design the infrastructure for efficient model training at scale
- Work across cluster management, model training, efficient data pipelines, inference and optimizing PyTorch code
Requirements
What you’ll need- Familiarity with the latest and most effective techniques in optimizing training and inference workloads—not from reading papers, but from implementing them
- Deep understanding of GPU memory hierarchy and computation capabilities
- Experience optimizing for both memory-bound and compute-bound operations
- Expertise with efficient attention algorithms and their performance characteristics at different scales
- Nice to Have: Experience in implementing custom GPU kernels and integrating them into PyTorch
- Familiarity with high-performance storage solutions and understanding of their performance characteristics for ML workloads
- Experience with managing SLURM clusters at scale
Benefits
Comp & perks- Competitive compensation and equity
- True ownership from day one
- Join at a pivotal moment
- Build for the next generation of creators
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU profilingdebugging training pipelinesparallelism strategiesmodel training infrastructuredata pipelinesPyTorch optimizationmemory-bound operationscompute-bound operationsefficient attention algorithmscustom GPU kernels