
Data Engineer
BJAK
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇮🇩 Indonesia
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
PythonSQL
About the role
- Collect, clean, and preprocess user-generated text and image data for fine-tuning large models
- Design and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teams
- Build and maintain automated datasets for content moderation (e.g., safe vs unsafe content)
- Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needs
Requirements
- Proven experience preparing datasets for machine learning or fine-tuning large models
- Strong skills in data cleaning, preprocessing, and transformation for both text and image data
- Hands-on experience with data labeling workflows and quality assurance for labeled data
- Familiarity with building and maintaining moderation datasets (safety, compliance, and filtering)
- Proficiency in scripting (Python, SQL) and working with large-scale data pipelines
Benefits
- Flat structure & real ownership
- Full involvement in direction and consensus decision making
- Flexibility in work arrangement
- High-impact role with visibility across product, data, and engineering
- Top-of-market compensation and performance-based bonuses
- Global exposure to product development
- Lots of perks - housing rental subsidies, a quality company cafeteria, and overtime meals
- Health, dental & vision insurance
- Global travel insurance (for you & your dependents)
- Unlimited, flexible time off
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
data cleaningdata preprocessingdata transformationdata labeling workflowsquality assurancePythonSQLlarge-scale data pipelinescontent moderationfine-tuning large models