ML Engineer – AI Platform Lead

BTSE

full-time

Posted on: 3/31/2026

Location Type: Remote

Location: Hong Kong

✨ AI Apply

About the role

Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency — all tracked per tenant.
Iterate on prompts daily with the domain expert during the pilot phase.

5+ years ML engineering; 2+ years working with large language models in production.
Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
Strong prompt engineering skills for production applications — you know how to make a base model produce consistent, structured, high-quality output.
Python: PyTorch, Transformers, FastAPI.
Familiar with LoRA/QLoRA fine-tuning workflows.
Experience building multi-tenant ML serving infrastructure.
Experience with financial or crypto AI applications.
Experience with cross-encoder reranking models (DeBERTa or similar).
Understanding of data isolation requirements for ML training pipelines.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

large language modelsRAG pipelinesprompt engineeringPythonPyTorchTransformersFastAPILoRAQLoRAcross-encoder reranking

Soft Skills

collaborationproblem-solvingcommunicationiterationfeedback collection