Machine Learning Engineer – Multilingual Data

Featherless AI

full-time

Posted on: 1/22/2026

Location Type: Remote

Location: Anywhere in the World

Visit company website

Explore more

Machine Learning Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Python Ray Spark

About the role

Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
Implement quality filters using statistical, heuristic, and model-based methods
Work with researchers to define language coverage, benchmarks, and evaluation metrics
Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
Support training, fine-tuning, and distillation workflows with high-quality multilingual data
Continuously iterate on datasets based on model performance and real-world usage

Requirements

3+ years of experience as an ML Engineer, Applied Scientist, or similar role
Strong experience working with multilingual or non-English datasets
Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
Experience building scalable data pipelines (Python, Spark, Ray, or similar)
Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
Comfort collaborating with researchers and translating research needs into production systems

Benefits

Competitive compensation + meaningful equity at Series A stage

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

data pipelinesmultilingual datasetsNLP fundamentalstokenizationembeddingslanguage modelingquality filtersstatistical methodsheuristic methodsmodel-based methods

Soft Skills

collaborationcommunicationproblem-solving