
ML Engineer – AI Platform Lead
BTSE
full-time
Posted on:
Location Type: Remote
Location: Hong Kong
Visit company websiteExplore more
Job Level
About the role
- Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
- Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
- Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
- Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
- Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
- Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
- Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency — all tracked per tenant.
- Iterate on prompts daily with the domain expert during the pilot phase.
Requirements
- 5+ years ML engineering; 2+ years working with large language models in production.
- Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
- Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
- Strong prompt engineering skills for production applications — you know how to make a base model produce consistent, structured, high-quality output.
- Python: PyTorch, Transformers, FastAPI.
- Familiar with LoRA/QLoRA fine-tuning workflows.
- Experience building multi-tenant ML serving infrastructure.
- Experience with financial or crypto AI applications.
- Experience with cross-encoder reranking models (DeBERTa or similar).
- Understanding of data isolation requirements for ML training pipelines.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
large language modelsRAG pipelinesprompt engineeringPythonPyTorchTransformersFastAPILoRAQLoRAcross-encoder reranking
Soft Skills
collaborationproblem-solvingcommunicationiterationfeedback collection