Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Tether.to

AI Research Engineer – Model Compression, Quantization

Tether.to

. Apply low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality.

Posted 5/19/2026full-timeRemote • 🇦🇪 United Arab EmiratesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
PyTorch

About the role

Key responsibilities & impact
  • Apply low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality.
  • Leverage knowledge distillation to transfer capabilities from larger teacher models to smaller student models, enabling efficient multimodal reasoning across text, image, and audio inputs.
  • Implement pruning techniques to remove redundant parameters and attention heads, reducing computational overhead without sacrificing task performance.
  • Analyze trade-offs between model efficiency (size, latency, memory) and accuracy across quantization, distillation, and pruning methods; propose improvements based on empirical findings.
  • Research and apply mixed-precision quantization and other advanced compression strategies (e.g., adaptive pruning schedules, distillation with intermediate feature matching) to optimize the accuracy–performance balance.
  • Stay current with the latest research in model compression, including emerging techniques for multimodal and generative architectures.
  • Document methodologies, experiments, and results clearly to support reproducibility, internal collaboration, and stakeholder communication.
  • Author technical papers and publish findings in top-tier conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ACL, AAAI) to advance the field of model compression for multimodal AI.

Requirements

What you’ll need
  • A degree in Computer Science or related field.
  • Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
  • Experience with PyTorch deep learning frameworks or equivalent frameworks.
  • Hands-on experience with model quantization including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
  • Research and hands-on experience with knowledge distillation for compressing large models into smaller, efficient ones.
  • Research and hands-on experience with model pruning for compressing large models into smaller, efficient ones.
  • Solid understanding of neural network architectures and training processes – Including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.
  • Familiarity with C++ is a plus (especially for implementing low-level quantization kernels or inference optimizations).

Benefits

Comp & perks
  • Competitive salary
  • Flexible work arrangements
  • Professional development opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
low-bit quantizationknowledge distillationmodel pruningQuantization-Aware Training (QAT)Post-Training Quantization (PTQ)neural network architecturestransformersbackpropagationoptimizationfine-tuning techniques
Soft Skills
documentationstakeholder communicationcollaborationresearch
Certifications
PhD in NLPPhD in Machine Learning