Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tether.to

AI Research Engineer – Multi-Modal, Vision

Tether.to

. Conduct end-to-end research and engineering on vision-language models, covering training, evaluation, and optimization across the full model development lifecycle.

Posted 5/19/2026full-timeRemote • 🇮🇹 ItalyMid-LevelSeniorWebsite

About the role

Key responsibilities & impact
  • Conduct end-to-end research and engineering on vision-language models, covering training, evaluation, and optimization across the full model development lifecycle.
  • Design and implement post-training pipelines including supervised fine-tuning, knowledge distillation, and reinforcement learning from human feedback.
  • Develop and maintain high-quality multimodal datasets, including data curation, filtering, and balancing for domain-specific tasks.
  • Drive model efficiency and deployability, adapting models for resource-constrained environments using compression and optimization techniques.
  • Design and implement evaluation frameworks and benchmarks to measure model performance, robustness, and real-world task success.
  • Build and scale training workflows across distributed GPU infrastructure.
  • Identify and resolve bottlenecks in training pipelines to achieve state-of-the-art model quality on target benchmarks.
  • Contribute to and leverage open-source ecosystems including models, datasets, and tooling to accelerate development.
  • Stay current with the latest research in multimodal learning and vision-language systems, translating relevant findings into practical improvements.
  • Publish research findings in top-tier AI conferences and journals where applicable.

Requirements

What you’ll need
  • Degree in Computer Science, Machine Learning, or a related field; MS/PhD preferred.
  • Strong experience with multimodal post-training workflows including supervised fine-tuning, knowledge distillation, and reinforcement learning from feedback.
  • Hands-on experience with parameter-efficient fine-tuning and distributed training frameworks.
  • Demonstrated ability to build and improve vision-language models with measurable results on standard benchmarks or real-world tasks.
  • Experience adapting models for resource-constrained environments.
  • Proven open-source contributions in multimodal AI on GitHub or HuggingFace.
  • Publications at top AI conferences (NeurIPS, ICML, ICLR, CVPR, ECCV etc.)

Benefits

Comp & perks
  • Remote work
  • Flexible work hours
  • Professional development opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
vision-language modelssupervised fine-tuningknowledge distillationreinforcement learningmultimodal datasetsmodel optimizationdistributed training frameworksparameter-efficient fine-tuningmodel evaluation frameworksdata curation
Soft Skills
problem-solvingresearchcommunicationcollaborationadaptabilitycritical thinkingcreativityattention to detailtime managementleadership
Certifications
PhD in Computer ScienceMS in Machine Learning