Data Architect – Annotation

ItsaCheckmate

full-time

Posted on: 1/5/2026

Location Type: Remote

Location: India

Visit company website

Explore more

Data Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Python SQL

About the role

Act as the transition point between Prompt Engineering and Data Labeling, translating model and product requirements into concrete data and annotation workflows.
Design, implement, and maintain scalable data workflows for dataset generation, curation, and ongoing maintenance.
Ensure data quality and consistency across labeling projects, with a focus on operational reliability for production AI systems.
Create, review, and maintain high-quality annotations across multiple modalities, including text, audio, conversational transcripts, and structured datasets.
Identify labeling inconsistencies, data errors, and edge cases; propose and enforce corrective actions and improvements to annotation standards.
Utilize platforms such as Labelbox, Label Studio, or Langfuse to manage large-scale labeling workflows and enforce consistent task execution.
Use Python and SQL for data extraction, validation, transformation, and workflow automation across labeling pipelines.
Leverage LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation of annotation outputs.
Implement automated QA checks and anomaly-detection mechanisms to scale quality assurance for large datasets.
Analyze annotation performance metrics and quality trends to surface actionable insights that improve labeling workflows and overall data accuracy.
Apply statistical analysis to detect data anomalies, annotation bias, and quality issues, and partner with stakeholders to mitigate them.
Collaborate with ML and Operations teams to refine labeling guidelines and enhance instructions based on observed patterns and error modes.
Work closely with Prompt Engineering, Data Labeling, and ML teams to ensure that data operations align with model requirements and product goals.
Document data standards, annotation guidelines, and workflow best practices for use by internal teams and external labeling partners.

Requirements

Experience with data annotation and hands-on use of platforms such as Labelbox, Label Studio, or Langfuse for managing large-scale labeling workflows.
Proficiency in Python and SQL for data extraction, validation, and workflow automation in a data operations or data engineering context.
Hands-on experience using LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation.
Demonstrated experience working with large-scale / high-volume datasets.
At least one prior role where data workflow automation is explicitly part of the job scope or responsibilities.
Ability to perform statistical analysis to detect data anomalies, annotation bias, and quality issues.
Strong requirement-elicitation and communication skills, with a process-driven and detail-oriented mindset when working with cross-functional teams.
**Qualifications: **
B.S. or higher in a quantitative discipline (Data Science, Computer Science, Engineering, or related field)
5+ years of relevant experience with a B.S. degree, or 3+ years of experience with a Master's degree
Demonstrated proficiency in SQL for reporting and Python for automation and scripting
Academic or applied research experience related to the NLP, LLM Benchmarking dataset is a strong plus
Must be flexible to work during US hours (until at least 1:30 PM EST)for this role.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonSQLdata annotationdata extractiondata validationworkflow automationstatistical analysisquality assurancedata curationdata maintenance

Soft Skills

requirement elicitationcommunicationdetail-orientedprocess-drivencollaboration