
AI Engineer – LLM Quantization Specialist
CloudFactory
full-time
Posted on:
Location Type: Hybrid
Location: Jakarta • 🇮🇩 Indonesia
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
CloudNumpyPythonPyTorchTensorflow
About the role
- Develop and implement **quantization and pruning strategies** for large language models (LLMs) to improve runtime efficiency and reduce memory footprint.
- Collaborate with the AI Research team on **model architecture, fine-tuning, and deployment** of multilingual and multimodal models.
- Evaluate and benchmark quantized models across hardware platforms (GPU, TPU, CPU, edge accelerators).
- Contribute to the design and maintenance of **model optimization pipelines** (training, evaluation, conversion, inference).
- Stay current with cutting-edge research on **model compression, distillation, and efficient inference frameworks**.
- Support continuous integration of optimized models into production and internal tools.
- Document methodologies and share insights across global teams to promote technical excellence and reproducibility
Requirements
- Bachelor’s or Master’s degree in **Computer Science, Machine Learning, Electrical Engineering,** or related field.
- **5+ years of professional experience** in AI/ML engineering, with a focus on deep learning model optimization.
- Hands-on experience with **quantization techniques** (e.g., PTQ, QAT, INT8/FP16 quantization) using frameworks like **PyTorch, TensorFlow, or ONNX Runtime.**
- Solid understanding of** LLM architectures** (e.g., Transformer-based models such as GPT, LLaMA, Mistral, Falcon).
- Strong programming skills in** Python**, including proficiency with **CUDA, NumPy, and PyTorch** internals.
- Experience with **distributed training/inference** and deployment on cloud or edge infrastructure.
- Excellent communication skills and comfort working in a **remote, cross-functional, international environment.**
- **Preferred Qualifications:**
- Experience with **quantization-aware training (QAT)** and post-training quantization (PTQ).
- Familiarity with **Hugging Face Transformers, DeepSpeed, or TensorRT.**
- Contributions to open-source ML optimization libraries or toolkits.
- Knowledge of **low-level performance profiling and benchmarking** (e.g., NVIDIA Nsight, PyTorch Profiler).
- Prior experience collaborating with global AI research teams across time zones.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
quantization techniquespruning strategiesmodel architecturefine-tuningmodel optimization pipelinesmodel compressiondistillationefficient inference frameworksdistributed trainingdeep learning model optimization
Soft skills
excellent communication skillscollaborationtechnical excellencereproducibilityworking in a remote environmentcross-functional teamworkinternational collaboration