Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tether.to

AI Research Engineer, Model Compression, Quantization

Tether.to

. Drive innovation in model compression and efficient deployment for advanced multimodal AI systems, including large language models (LLMs) and vision-language models (VLMs).

Posted 5/19/2026full-timeRemote • 🇮🇹 ItalyMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
PyTorch

About the role

Key responsibilities & impact
  • Drive innovation in model compression and efficient deployment for advanced multimodal AI systems, including large language models (LLMs) and vision-language models (VLMs).
  • Reduce model footprint and computational cost while preserving accuracy, enabling high-performance AI to run efficiently across resource-constrained edge devices.
  • Apply and advance compression techniques such as quantization, knowledge distillation, and pruning.
  • Build robust compression pipelines, establish performance and fidelity metrics, and address bottlenecks in production inference.

Requirements

What you’ll need
  • A degree in Computer Science or related field.
  • Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
  • Experience with PyTorch deep learning frameworks or equivalent frameworks
  • Hands-on experience with model quantization including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
  • Research and hands-on experience with knowledge distillation for compressing large models into smaller, efficient ones.
  • Research and hands-on experience with model pruning for compressing large models into smaller, efficient ones.
  • Solid understanding of neural network architectures and training processes – Including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.
  • Familiarity with C++ is a plus (especially for implementing low-level quantization kernels or inference optimizations).

Benefits

Comp & perks
  • Competitive salary
  • Flexible work hours
  • Professional development budget
  • Home office setup allowance
  • Global team events

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
model compressionquantizationknowledge distillationpruningPyTorchQuantization-Aware TrainingPost-Training Quantizationneural network architecturestransformersbackpropagation
Certifications
PhD in NLPPhD in Machine Learning