Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
poolside

Engineering Member – Pre-training, Data Research

poolside

Data role focused on improving dataset quality for AI model training at Poolside. Collaborate with teams to ensure high-quality datasets for large training volumes.

Posted 5/19/2026full-timeRemote • 🇪🇺 Anywhere in EuropeMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
Python

About the role

Key responsibilities & impact
  • You’ll be working on our data team focused on the quality of the datasets being delivered for training our models.
  • This is a hands-on role where your #1 mission would be to improve the quality of the pretraining datasets by leveraging your previous experience, intuition and training experiments.
  • This includes synthetic data generation and data mix optimization.
  • You’ll closely collaborate with other teams like Pretraining, Postraining, Evals, and Product to define high-quality data needs that map to missing model capabilities and downstream use cases.
  • Staying in sync with the latest research in the fields of dataset design and pretraining is key to success in this role.
  • You will constantly lead original research initiatives through short, time-bounded experiments while deploying highly technical engineering solutions into production.
  • With the volumes of data to process being massive, you'll have a performant distributed data pipeline together with a large GPU cluster at your disposal.

Requirements

What you’ll need
  • Strong machine learning and engineering background
  • Experience with Large Language Models (LLM), including:
  • Understanding of transformer architectures and how LLMs learn
  • Data ablations and scaling laws
  • Mid-training and Post-training techniques
  • Training reasoning and agentic models
  • Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc)
  • Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc.
  • Excellent programming skills in Python
  • Strong prompt engineering skills
  • Experience working with large-scale GPU clusters and distributed data pipelines
  • Strong obsession with data quality
  • Research experience:
  • Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have
  • Can freely discuss the latest papers and descend to fine details
  • Is reasonably opinionated

Benefits

Comp & perks
  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you & dependents
  • Company-provided equipment
  • Well-being, always-be-learning & home office allowances
  • Frequent team get togethers
  • Diverse & inclusive people-first culture

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learningLarge Language Modelstransformer architecturesdata ablationsscaling lawsmid-training techniquespost-training techniquesdata curationdeduplicationtokenization
Soft Skills
strong obsession with data qualityexcellent programming skillsstrong prompt engineering skillsresearch experienceability to discuss latest papersopinionated