
Machine Learning Engineer – Multilingual Data
Featherless AI
full-time
Posted on:
Location Type: Remote
Location: Anywhere in the World
Visit company websiteExplore more
About the role
- Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
- Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
- Implement quality filters using statistical, heuristic, and model-based methods
- Work with researchers to define language coverage, benchmarks, and evaluation metrics
- Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
- Support training, fine-tuning, and distillation workflows with high-quality multilingual data
- Continuously iterate on datasets based on model performance and real-world usage
Requirements
- 3+ years of experience as an ML Engineer, Applied Scientist, or similar role
- Strong experience working with multilingual or non-English datasets
- Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
- Experience building scalable data pipelines (Python, Spark, Ray, or similar)
- Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
- Comfort collaborating with researchers and translating research needs into production systems
Benefits
- Competitive compensation + meaningful equity at Series A stage
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data pipelinesmultilingual datasetsNLP fundamentalstokenizationembeddingslanguage modelingquality filtersstatistical methodsheuristic methodsmodel-based methods
Soft Skills
collaborationcommunicationproblem-solving