CloudFactory

AI Engineer – LLM Quantization Specialist

CloudFactory

full-time

Posted on:

Location Type: Hybrid

Location: Jakarta • 🇮🇩 Indonesia

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

CloudNumpyPythonPyTorchTensorflow

About the role

  • Develop and implement **quantization and pruning strategies** for large language models (LLMs) to improve runtime efficiency and reduce memory footprint.
  • Collaborate with the AI Research team on **model architecture, fine-tuning, and deployment** of multilingual and multimodal models.
  • Evaluate and benchmark quantized models across hardware platforms (GPU, TPU, CPU, edge accelerators).
  • Contribute to the design and maintenance of **model optimization pipelines** (training, evaluation, conversion, inference).
  • Stay current with cutting-edge research on **model compression, distillation, and efficient inference frameworks**.
  • Support continuous integration of optimized models into production and internal tools.
  • Document methodologies and share insights across global teams to promote technical excellence and reproducibility

Requirements

  • Bachelor’s or Master’s degree in **Computer Science, Machine Learning, Electrical Engineering,** or related field.
  • **5+ years of professional experience** in AI/ML engineering, with a focus on deep learning model optimization.
  • Hands-on experience with **quantization techniques** (e.g., PTQ, QAT, INT8/FP16 quantization) using frameworks like **PyTorch, TensorFlow, or ONNX Runtime.**
  • Solid understanding of** LLM architectures** (e.g., Transformer-based models such as GPT, LLaMA, Mistral, Falcon).
  • Strong programming skills in** Python**, including proficiency with **CUDA, NumPy, and PyTorch** internals.
  • Experience with **distributed training/inference** and deployment on cloud or edge infrastructure.
  • Excellent communication skills and comfort working in a **remote, cross-functional, international environment.**
  • **Preferred Qualifications:**
  • Experience with **quantization-aware training (QAT)** and post-training quantization (PTQ).
  • Familiarity with **Hugging Face Transformers, DeepSpeed, or TensorRT.**
  • Contributions to open-source ML optimization libraries or toolkits.
  • Knowledge of **low-level performance profiling and benchmarking** (e.g., NVIDIA Nsight, PyTorch Profiler).
  • Prior experience collaborating with global AI research teams across time zones.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
quantization techniquespruning strategiesmodel architecturefine-tuningmodel optimization pipelinesmodel compressiondistillationefficient inference frameworksdistributed trainingdeep learning model optimization
Soft skills
excellent communication skillscollaborationtechnical excellencereproducibilityworking in a remote environmentcross-functional teamworkinternational collaboration