ItsaCheckmate

Data Architect – Annotation

ItsaCheckmate

full-time

Posted on:

Location Type: Remote

Location: India

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Act as the transition point between Prompt Engineering and Data Labeling, translating model and product requirements into concrete data and annotation workflows.
  • Design, implement, and maintain scalable data workflows for dataset generation, curation, and ongoing maintenance.
  • Ensure data quality and consistency across labeling projects, with a focus on operational reliability for production AI systems.
  • Create, review, and maintain high-quality annotations across multiple modalities, including text, audio, conversational transcripts, and structured datasets.
  • Identify labeling inconsistencies, data errors, and edge cases; propose and enforce corrective actions and improvements to annotation standards.
  • Utilize platforms such as Labelbox, Label Studio, or Langfuse to manage large-scale labeling workflows and enforce consistent task execution.
  • Use Python and SQL for data extraction, validation, transformation, and workflow automation across labeling pipelines.
  • Leverage LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation of annotation outputs.
  • Implement automated QA checks and anomaly-detection mechanisms to scale quality assurance for large datasets.
  • Analyze annotation performance metrics and quality trends to surface actionable insights that improve labeling workflows and overall data accuracy.
  • Apply statistical analysis to detect data anomalies, annotation bias, and quality issues, and partner with stakeholders to mitigate them.
  • Collaborate with ML and Operations teams to refine labeling guidelines and enhance instructions based on observed patterns and error modes.
  • Work closely with Prompt Engineering, Data Labeling, and ML teams to ensure that data operations align with model requirements and product goals.
  • Document data standards, annotation guidelines, and workflow best practices for use by internal teams and external labeling partners.

Requirements

  • Experience with data annotation and hands-on use of platforms such as Labelbox, Label Studio, or Langfuse for managing large-scale labeling workflows.
  • Proficiency in Python and SQL for data extraction, validation, and workflow automation in a data operations or data engineering context.
  • Hands-on experience using LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation.
  • Demonstrated experience working with large-scale / high-volume datasets.
  • At least one prior role where data workflow automation is explicitly part of the job scope or responsibilities.
  • Ability to perform statistical analysis to detect data anomalies, annotation bias, and quality issues.
  • Strong requirement-elicitation and communication skills, with a process-driven and detail-oriented mindset when working with cross-functional teams.
  • **Qualifications: **
  • B.S. or higher in a quantitative discipline (Data Science, Computer Science, Engineering, or related field)
  • 5+ years of relevant experience with a B.S. degree, or 3+ years of experience with a Master's degree
  • Demonstrated proficiency in SQL for reporting and Python for automation and scripting
  • Academic or applied research experience related to the NLP, LLM Benchmarking dataset is a strong plus
  • Must be flexible to work during US hours (until at least 1:30 PM EST)for this role.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonSQLdata annotationdata extractiondata validationworkflow automationstatistical analysisquality assurancedata curationdata maintenance
Soft skills
requirement elicitationcommunicationdetail-orientedprocess-drivencollaboration