AI Engineer – LLM Quantization Specialist

CloudFactory

full-time

Posted on: 10/14/2025

Location Type: Hybrid

Location: Jakarta • 🇮🇩 Indonesia

Visit company website

✨ AI Apply

Apply

Job Level

Mid-LevelSenior

Tech Stack

CloudNumpyPythonPyTorchTensorflow

About the role

Develop and implement **quantization and pruning strategies** for large language models (LLMs) to improve runtime efficiency and reduce memory footprint.
Collaborate with the AI Research team on **model architecture, fine-tuning, and deployment** of multilingual and multimodal models.
Evaluate and benchmark quantized models across hardware platforms (GPU, TPU, CPU, edge accelerators).
Contribute to the design and maintenance of **model optimization pipelines** (training, evaluation, conversion, inference).
Stay current with cutting-edge research on **model compression, distillation, and efficient inference frameworks**.
Support continuous integration of optimized models into production and internal tools.
Document methodologies and share insights across global teams to promote technical excellence and reproducibility

Requirements

Bachelor’s or Master’s degree in **Computer Science, Machine Learning, Electrical Engineering,** or related field.
**5+ years of professional experience** in AI/ML engineering, with a focus on deep learning model optimization.
Hands-on experience with **quantization techniques** (e.g., PTQ, QAT, INT8/FP16 quantization) using frameworks like **PyTorch, TensorFlow, or ONNX Runtime.**
Solid understanding of** LLM architectures** (e.g., Transformer-based models such as GPT, LLaMA, Mistral, Falcon).
Strong programming skills in** Python**, including proficiency with **CUDA, NumPy, and PyTorch** internals.
Experience with **distributed training/inference** and deployment on cloud or edge infrastructure.
Excellent communication skills and comfort working in a **remote, cross-functional, international environment.**
**Preferred Qualifications:**
Experience with **quantization-aware training (QAT)** and post-training quantization (PTQ).
Familiarity with **Hugging Face Transformers, DeepSpeed, or TensorRT.**
Contributions to open-source ML optimization libraries or toolkits.
Knowledge of **low-level performance profiling and benchmarking** (e.g., NVIDIA Nsight, PyTorch Profiler).
Prior experience collaborating with global AI research teams across time zones.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

quantization techniquespruning strategiesmodel architecturefine-tuningmodel optimization pipelinesmodel compressiondistillationefficient inference frameworksdistributed trainingdeep learning model optimization

Soft skills

excellent communication skillscollaborationtechnical excellencereproducibilityworking in a remote environmentcross-functional teamworkinternational collaboration