Snorkel AI

Applied Research Engineer – Training Infra

Snorkel AI

full-time

Posted on:

Location Type: Hybrid

Location: Redwood CityCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $150,000 - $180,000 per year

About the role

  • Own the infrastructure that powers model training and evaluation work
  • Build and operate GPU cluster infrastructure, training pipelines
  • Translate training requirements into robust, reproducible systems
  • Monitor and optimize cluster health, inter-node communication
  • Work closely with research scientists and ML engineers

Requirements

  • Hands-on experience managing GPU clusters on major cloud providers
  • Experience with distributed compute orchestration tools such as Kubernetes, Slurm, or equivalent
  • Working knowledge of distributed training concepts
  • Experience with setting up, managing, and integrating ML experiment tracking
  • Strong Python proficiency and solid software engineering fundamentals
  • Ability to work in a fast-moving, iterative environment
  • Hands-on experience with post-training workflows is a plus
Benefits
  • Global team events
  • Professional development opportunities
  • Health insurance
  • Flexible working hours
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GPU clusterscloud providersKubernetesSlurmdistributed trainingML experiment trackingPythonsoftware engineering
Soft Skills
ability to work in fast-moving environmentiterative work