Design, develop, and deploy production-grade LLM systems and agentic workflows for clinical applications, ensuring they are safe, reliable, and clinically validated.
Lead architecture and system design efforts for complex LLM-based applications including RAG systems, multi-agent frameworks, and clinical decision support tools.
Develop comprehensive evaluation frameworks and benchmarks to assess LLM performance on clinical tasks, including accuracy, safety, bias, and hallucination detection.
Implement and optimize prompt engineering strategies using frameworks like DSPy, LangChain, or similar tools to systematically improve model performance.
Collaborate with clinicians, product managers, regulatory affairs, and engineering teams to translate clinical needs into robust AI solutions that meet healthcare standards.
Create clear, interpretable metrics and visualizations to communicate model system performance to both technical and non-technical stakeholders, including clinical users and regulatory bodies.
Mentor and coach ML engineers and data scientists, fostering a culture of responsible AI development, technical excellence, and continuous learning.
Champion LLM engineering best practices including prompt versioning, evaluation-driven development, safety guardrails, and monitoring for production systems.
Requirements
Proficiency in Python, with deep expertise in LLM frameworks and tooling.
5+ years experience in ML/AI engineering, with 3+ years specifically working with large language models in production environments.
3+ years experience building and deploying LLM applications using modern frameworks (LangChain, LlamaIndex, DSPy, Haystack, or similar).
3+ years experience with prompt engineering, optimization, and systematic prompt improvement methodologies.
3+ years experience designing evaluation frameworks, benchmarks, and metrics for ML systems, particularly for generative AI and LLMs.
3+ years experience with agentic systems, tool use, function calling, and multi-step reasoning workflows.
3+ years experience with retrieval systems, vector databases, and RAG architectures (e.g., Pinecone, Weaviate, ChromaDB, FAISS).
5+ years experience deploying scalable ML services in the cloud using Docker and Kubernetes (e.g., AWS, GCP, Azure).
Strong understanding of LLM safety, including hallucination detection, bias mitigation, and adversarial robustness.
Experience with fine-tuning, RLHF, or other model adaptation techniques.
Experience with E2E monitoring and observability (e.g., Grafana, Prometheus), user telemetry/alerting, model throughput, and structured logging.
Experience building annotation workflows and tooling, ETL/cleaning pipelines, schema design, data provenance tracking, and dataset/version control.
Benefits
Paid Time Off (PTO)
Health, Dental, Vision and Life insurance
401k Retirement Savings Plan
Employee Discounts
Voluntary benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.