BJAK

Data Engineer

BJAK

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

PythonSQL

About the role

  • Collect, clean, and preprocess user-generated text and image data for fine-tuning large models
  • Design and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teams
  • Build and maintain automated datasets for content moderation (e.g., safe vs unsafe content)
  • Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needs

Requirements

  • Proven experience preparing datasets for machine learning or fine-tuning large models
  • Strong skills in data cleaning, preprocessing, and transformation for both text and image data
  • Hands-on experience with data labeling workflows and quality assurance for labeled data
  • Familiarity with building and maintaining moderation datasets (safety, compliance, and filtering)
  • Proficiency in scripting (Python, SQL) and working with large-scale data pipelines