
Data Architect – Annotation
ItsaCheckmate
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
About the role
- Act as the transition point between Prompt Engineering and Data Labeling, translating model and product requirements into concrete data and annotation workflows.
- Design, implement, and maintain scalable data workflows for dataset generation, curation, and ongoing maintenance.
- Ensure data quality and consistency across labeling projects, with a focus on operational reliability for production AI systems.
- Create, review, and maintain high-quality annotations across multiple modalities, including text, audio, conversational transcripts, and structured datasets.
- Identify labeling inconsistencies, data errors, and edge cases; propose and enforce corrective actions and improvements to annotation standards.
- Utilize platforms such as Labelbox, Label Studio, or Langfuse to manage large-scale labeling workflows and enforce consistent task execution.
- Use Python and SQL for data extraction, validation, transformation, and workflow automation across labeling pipelines.
- Leverage LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation of annotation outputs.
- Implement automated QA checks and anomaly-detection mechanisms to scale quality assurance for large datasets.
- Analyze annotation performance metrics and quality trends to surface actionable insights that improve labeling workflows and overall data accuracy.
- Apply statistical analysis to detect data anomalies, annotation bias, and quality issues, and partner with stakeholders to mitigate them.
- Collaborate with ML and Operations teams to refine labeling guidelines and enhance instructions based on observed patterns and error modes.
- Work closely with Prompt Engineering, Data Labeling, and ML teams to ensure that data operations align with model requirements and product goals.
- Document data standards, annotation guidelines, and workflow best practices for use by internal teams and external labeling partners.
Requirements
- Experience with data annotation and hands-on use of platforms such as Labelbox, Label Studio, or Langfuse for managing large-scale labeling workflows.
- Proficiency in Python and SQL for data extraction, validation, and workflow automation in a data operations or data engineering context.
- Hands-on experience using LLMs (e.g., GPT-4, Claude, Gemini) for prompt-based quality checks, automated review, and data validation.
- Demonstrated experience working with large-scale / high-volume datasets.
- At least one prior role where data workflow automation is explicitly part of the job scope or responsibilities.
- Ability to perform statistical analysis to detect data anomalies, annotation bias, and quality issues.
- Strong requirement-elicitation and communication skills, with a process-driven and detail-oriented mindset when working with cross-functional teams.
- **Qualifications: **
- B.S. or higher in a quantitative discipline (Data Science, Computer Science, Engineering, or related field)
- 5+ years of relevant experience with a B.S. degree, or 3+ years of experience with a Master's degree
- Demonstrated proficiency in SQL for reporting and Python for automation and scripting
- Academic or applied research experience related to the NLP, LLM Benchmarking dataset is a strong plus
- Must be flexible to work during US hours (until at least 1:30 PM EST)for this role.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonSQLdata annotationdata extractiondata validationworkflow automationstatistical analysisquality assurancedata curationdata maintenance
Soft skills
requirement elicitationcommunicationdetail-orientedprocess-drivencollaboration