Hands-on experience managing GPU clusters on major cloud providers
Experience with distributed compute orchestration tools such as Kubernetes, Slurm, or equivalent
Working knowledge of distributed training concepts
Experience with setting up, managing, and integrating ML experiment tracking
Strong Python proficiency and solid software engineering fundamentals
Ability to work in a fast-moving, iterative environment
Hands-on experience with post-training workflows is a plus

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

GPU clusterscloud providersKubernetesSlurmdistributed trainingML experiment trackingPythonsoftware engineering

Soft Skills

ability to work in fast-moving environmentiterative work