Engineer

• Develop and lead end-to-end pre-training of large language models on GPU clusters.
• Combine deep engineering expertise with research intuition.
• Build data pipelines and perform distributed training at scale.
• Make informed decisions about microbatch and global batch configurations.
• Provide strategic insights to the executive team on financial implications.
• Design capital allocation frameworks for sustainability.
• Operate distributed training infrastructure using modern techniques.
• Write production-grade PyTorch and Triton/CUDA kernels when required.
• Lead cross-functional efforts and mentor engineers.

Member of Technical Staff, Training Engineer – Large Scale Foundation Models

Job Level

Tech Stack

About the role

Requirements

Applicant Tracking System Keywords

Hard skills

Soft skills

Certifications