FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Research Engineer – Model Compression, Quantization
Tether.to. Apply low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality.
Tech Stack
Tools & technologiesPyTorch
About the role
Key responsibilities & impact- Apply low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality.
- Leverage knowledge distillation to transfer capabilities from larger teacher models to smaller student models, enabling efficient multimodal reasoning across text, image, and audio inputs.
- Implement pruning techniques to remove redundant parameters and attention heads, reducing computational overhead without sacrificing task performance.
- Analyze trade-offs between model efficiency (size, latency, memory) and accuracy across quantization, distillation, and pruning methods; propose improvements based on empirical findings.
- Research and apply mixed-precision quantization and other advanced compression strategies (e.g., adaptive pruning schedules, distillation with intermediate feature matching) to optimize the accuracy–performance balance.
- Stay current with the latest research in model compression, including emerging techniques for multimodal and generative architectures.
- Document methodologies, experiments, and results clearly to support reproducibility, internal collaboration, and stakeholder communication.
- Author technical papers and publish findings in top-tier conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ACL, AAAI) to advance the field of model compression for multimodal AI.
Requirements
What you’ll need- A degree in Computer Science or related field.
- Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
- Experience with PyTorch deep learning frameworks or equivalent frameworks.
- Hands-on experience with model quantization including both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ).
- Research and hands-on experience with knowledge distillation for compressing large models into smaller, efficient ones.
- Research and hands-on experience with model pruning for compressing large models into smaller, efficient ones.
- Solid understanding of neural network architectures and training processes – Including transformers (e.g., LLMs, VLMs), backpropagation, optimization, and fine-tuning techniques.
- Familiarity with C++ is a plus (especially for implementing low-level quantization kernels or inference optimizations).
Benefits
Comp & perks- Competitive salary
- Flexible work arrangements
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
low-bit quantizationknowledge distillationmodel pruningQuantization-Aware Training (QAT)Post-Training Quantization (PTQ)neural network architecturestransformersbackpropagationoptimizationfine-tuning techniques
Soft Skills
documentationstakeholder communicationcollaborationresearch
Certifications
PhD in NLPPhD in Machine Learning