Featherless AI

Machine Learning Engineer – Multilingual Data

Featherless AI

full-time

Posted on:

Location Type: Remote

Location: Anywhere in the World

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
  • Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
  • Implement quality filters using statistical, heuristic, and model-based methods
  • Work with researchers to define language coverage, benchmarks, and evaluation metrics
  • Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
  • Support training, fine-tuning, and distillation workflows with high-quality multilingual data
  • Continuously iterate on datasets based on model performance and real-world usage

Requirements

  • 3+ years of experience as an ML Engineer, Applied Scientist, or similar role
  • Strong experience working with multilingual or non-English datasets
  • Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
  • Experience building scalable data pipelines (Python, Spark, Ray, or similar)
  • Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
  • Comfort collaborating with researchers and translating research needs into production systems
Benefits
  • Competitive compensation + meaningful equity at Series A stage
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data pipelinesmultilingual datasetsNLP fundamentalstokenizationembeddingslanguage modelingquality filtersstatistical methodsheuristic methodsmodel-based methods
Soft Skills
collaborationcommunicationproblem-solving