
Senior Language Data Scientist
Innodata Inc.
full-time
Posted on:
Location Type: Remote
Location: New Jersey • United States
Visit company websiteExplore more
Job Level
About the role
- You can lead long-term projects with high complexity and ambiguity from first discussion with the client to completion
- Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data-collection workflows, as well as synthetic ones
- Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers
- Critically assess annotation tooling and workflows
- Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance
- Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.
- Set an ambitious research agenda for improving our products and services
- Contribute to establishing best practices and standards for generative AI development with customers and within the organization
Requirements
- MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
- Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
- Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals
- Design efficient data strategies for complex long-term projects, potentially involving multiple teams and workflows.
- Knowledge of how components of GenAI products or services combine to work
- Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders
- Familiarity with GenAI technologies that enables you to improve existing processes to handle future challenges.
- Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.
- Deep understanding of language and its relationship with culture
- Ability to identify ambiguity and subjectivity in language
- Ability to work with multi-lingual and multi-modal projects
- Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
- Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
- Proficiency in Python to handle / transform large datasets (e.g. pre- and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
- Deep understanding of data pipelines to support ML and NLP workflows,
- Knowledge of efficient data collection, transformation, and storage
- Knowledge of data structures, algorithms, and data engineering principles
- Excellent interpersonal skills for effective cross-functional stakeholder engagement
- Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions
- Ability to work independently and collaborate as part of a team
- Adaptable to changing technologies and methodologies
- Ability to translate experience, research and development information to understand client products and services.
Benefits
- Providing technical mentorship and guidance to junior team members
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data analysisstatistical analysisNatural Language ProcessingPythondata pipelinesdata collectiondata transformationdata storagemetricshuman evaluation tasks
Soft Skills
interpersonal skillsproblem-solving skillscollaborationcritical thinkingadaptabilitycommunicationinnovationindependencecreativitystakeholder engagement