Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
ABBYY

Senior Machine Learning Engineer, Synthetic Data, Document Understanding

ABBYY

. Design and implement pipelines that analyze real documents to inform high-fidelity synthetic data generation .

Posted 5/21/2026full-timeBangalore • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
CloudPythonPyTorch

About the role

Key responsibilities & impact
  • Design and implement pipelines that analyze real documents to inform high-fidelity synthetic data generation
  • Build generative systems capable of producing documents across diverse formats, layouts, and domains
  • Develop evaluation frameworks to ensure synthetic data maintains distributional fidelity and diversity
  • Research and apply generative modeling techniques suited for document AI training
  • Identify and mitigate quality issues to ensure synthetic data is effective for downstream model training
  • Partner with Modeling teams to measure the impact of synthetic data on model performance
  • Own the synthetic data generation track end-to-end, from architecture to quality validation
  • Drive architectural decisions balancing quality, diversity, scale, and cost efficiency
  • Define and maintain data quality metrics and generation dashboards
  • Collaborate closely with annotation teams to ensure compatibility with downstream pipelines
  • Contribute to roadmap planning alongside Principal-level leadership
  • Build scalable pipelines capable of generating millions of synthetic training examples
  • Implement post-processing, filtering, and validation mechanisms to remove low-quality outputs
  • Design cost-efficient workflows balancing compute, quality, and throughput
  • Develop monitoring systems to detect distribution shifts or quality degradation over time
  • Collaborate with Platform teams on compute orchestration, storage, and scheduling.

Requirements

What you’ll need
  • MS or PhD in Computer Science, Engineering, Mathematics, or related field
  • 5+ years of experience in Machine Learning / AI, with focus on:
  • Generative models
  • Vision-Language Models (VLMs)
  • Synthetic data systems
  • Proven experience building and evaluating synthetic data pipelines for ML training
  • Strong background in data quality evaluation and statistical analysis
  • Deep expertise in Vision-Language Models and document understanding (layout, structure, semantics)
  • Strong knowledge of generative modeling for structured and semi-structured data
  • Understanding of what makes synthetic data valuable:
  • Distributional fidelity
  • Diversity
  • Realistic noise patterns
  • Domain coverage
  • Strong programming skills in Python with experience in PyTorch or similar frameworks
  • Experience evaluating data quality via automated metrics and downstream model impact
  • Familiarity with large-scale data pipelines, cloud environments, and experiment tracking
  • Proven ability to independently own complex technical workstreams
  • Strong collaboration across data, modeling, and platform teams
  • Ability to clearly communicate data quality and generation trade-offs
  • Data-driven mindset with strong attention to coverage gaps and quality signals.

Benefits

Comp & perks
  • Comprehensive medical, accidental, and life insurance
  • Weekly wellness sessions to support your physical and mental well-being
  • A generous paid time off policy

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
generative modelsVision-Language Modelssynthetic data systemsdata quality evaluationstatistical analysisPythonPyTorchdata pipelinescloud environmentsautomated metrics
Soft Skills
collaborationcommunicationdata-driven mindsetattention to detailindependent ownershiproadmap planningarchitectural decision-makingquality validationproblem-solvingimpact measurement
Certifications
MS in Computer SciencePhD in Computer ScienceEngineeringMathematics