
AI/ML Research Engineer, LLM Post-Training & Evaluation
Innodata Inc.
full-time
Posted on:
Location Type: Remote
Location: New Jersey • United States
Visit company websiteExplore more
Tech Stack
About the role
- Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
- Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
- Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
- Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
- Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
- Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
- Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
- Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions
- Contribute to internal research and platform development, including benchmark frameworks, evaluation tooling, and post-training workflow improvements
- Contribute to best practices and standards for LLM training, evaluation, and quality assurance across projects
- Mentor junior engineers and contribute to technical design reviews, documentation, and engineering rigor across the team
Requirements
- BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or related quantitative technical field (MS/PhD preferred)
- 2-3 years of relevant industry or research engineering experience in ML/AI systems
- Hands-on experience with LLM training / fine-tuning / post-training, including at least one of: supervised fine-tuning (SFT) preference optimization (e.g., DPO or related methods) RLHF / RLAIF-style workflows task- or domain-adaptation of foundation models
- Strong programming skills in Python and experience building production-quality ML code
- Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
- Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
- Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
- Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads (GPU/accelerator environments preferred)
- Experience with large-scale data processing and workflow orchestration in support of model training/evaluation
- Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads
- Strong written and verbal communication skills, including the ability to explain complex technical tradeoffs to both technical and non-technical audiences.
Benefits
- Health insurance
- Retirement plans
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learning engineeringLLM trainingfine-tuningevaluation systemsdata ingestionPython programmingML frameworksevaluation pipelinesdistributed trainingdata processing
Soft Skills
collaborationmentoringcommunicationproblem-solvingtechnical design reviews
Certifications
BS in Computer ScienceMS in Computer SciencePhD in Computer ScienceBS in Machine LearningMS in Machine LearningPhD in Machine LearningBS in AIMS in AIPhD in AIBS in Applied Mathematics