Salary
💰 $160,000 - $190,000 per year
Tech Stack
DockerElasticSearchKubernetes
About the role
- Architect and optimize NLP pipelines for intent detection, entity extraction, and dialogue management
- Own the Search/RAG backend: design, build, and tune vector search indexes of medical texts for high recall/precision
- Implement document ingestion pipelines to ensure fast, accurate knowledge retrieval
- Integrate search results seamlessly into LLM prompts to improve evidence-based answers
- Define evaluation metrics (e.g., F1, ROUGE, latency) and establish ML model CI/CD practices
- Deploy scalable services with FastAPI and Kubernetes
- First-month focus: audit existing search pipelines, stand up ingestion→vectorization→retrieval pipeline, and define baseline evaluation metrics
- 90-day OKRs: improve recall/precision, integrate RAG into production for real doctor interactions, and deliver scalable model evaluation framework
Requirements
- 5+ years of experience in machine learning/NLP engineering
- Strong expertise with information retrieval systems (e.g., Elasticsearch, OpenSearch, Pinecone, FAISS)
- Hands-on experience with transformer models (e.g., BERT, GPT) and RAG techniques
- Skilled in production deployment (FastAPI, Kubernetes, Docker)
- Track record of designing evaluation frameworks and CI/CD for ML models
- Familiarity with biomedical or clinical text (nice-to-have)
- Experience in large-scale search or recommendation systems (nice-to-have)
- Background in healthcare AI or regulatory (HIPAA-compliant) environments (nice-to-have)