
Senior AI Engineer, NLP, Training Data
Coupa Software
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
Job Level
About the role
- Design and implement training data generation pipelines, including synthetic data generation.
- Build data labeling and annotation workflows with quality validation loops.
- Convert enterprise data into formats suitable for model training (instruction-tuning pairs, embeddings).
- Implement active learning strategies to identify high-value training examples.
- Collaborate with domain experts to validate training data quality and relevance.
- Build automated data quality checks: coverage, balance, consistency.
- Design training data versioning and lineage tracking.
- Analyze model evaluation results to identify training data gaps.
Requirements
- 5+ years of software engineering experience, with 2+ years in NLP, data science, or ML data engineering.
- Experience with text processing, tokenization, and NLP pipelines.
- Hands-on experience with data labeling tools and annotation workflows.
- Experience generating synthetic training data using language model APIs.
- Understanding of instruction-tuning and training data quality metrics.
- Proficiency in Python (pandas, PySpark).
- Experience with data versioning tools is a plus.
- BS/MS in Computer Science, NLP, or equivalent experience.
Benefits
- Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
- Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
- Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data generation pipelinessynthetic data generationdata labelingannotation workflowsactive learning strategiesdata quality checksPythonpandasPySparktext processing
Soft Skills
collaborationquality validationanalysis
Certifications
BS in Computer ScienceMS in Computer Science