BTSE

ML Engineer – AI Platform Lead

BTSE

full-time

Posted on:

Location Type: Remote

Location: Singapore

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Deploy and optimise a large language model for production inference: quantisation, continuous batching, low-latency serving.
  • Build the RAG pipeline: document chunking, embedding generation, vector storage, cross-encoder reranking, and context assembly optimised for a 128K-token context window.
  • Build the context layer: per-tenant system prompts, dynamically retrieved few-shot exemplars, task routing (classifying incoming requests to the right prompt configuration).
  • Build defensive output parsing: structured JSON output from an unmodified base model with graceful fallbacks.
  • Design and implement the feedback collection pipeline: capturing user corrections and ratings, automatically generating training data candidates for future fine-tuning.
  • Design the custom model training workflow: tenant-scoped LoRA training on client-specific data, model evaluation, A/B testing, and isolated deployment.
  • Monitor and improve inference quality: parsing failure rates, citation accuracy, hallucination rates, latency — all tracked per tenant.
  • Iterate on prompts daily with the domain expert during the pilot phase.

Requirements

  • 5+ years ML engineering; 2+ years working with large language models in production.
  • Hands-on experience with LLM serving frameworks (vLLM, TGI, or equivalent).
  • Deep experience building RAG pipelines: chunking strategies, embedding models, vector databases, reranking.
  • Strong prompt engineering skills for production applications — you know how to make a base model produce consistent, structured, high-quality output.
  • Python: PyTorch, Transformers, FastAPI.
  • Familiar with LoRA/QLoRA fine-tuning workflows.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
large language modelsRAG pipelinechunking strategiesembedding modelsvector databasesrerankingprompt engineeringPythonPyTorchFastAPI
Soft Skills
collaboration with domain expertsproblem-solvinganalytical thinkingattention to detailcommunication