Research Scientist

• Design and run experiments to measure accuracy, robustness, and hallucination rates in LLM agents
• Build automated evaluation pipelines (LLM-as-judge + human review) with clinical-grade benchmarks
• Partner with Research Ops/IRB to design efficacy studies and align with regulatory requirements
• Translate research into production-ready evaluation systems, collaborating with engineering to land features 0→1
• Develop error taxonomies, ablations, and guardrails to ensure safe and reliable agent behaviors
• Audit existing evaluation approaches for clinical and agentic tasks (first-month focus)
• Define initial benchmarks and build early automated pipelines (first-month focus)
• Partner with engineering to land CI gates for accuracy, factuality, and safety (first-month focus)
• Deliver repeatable evaluation framework with automated pipelines in production (90-day OKR)
• Demonstrate measurable improvements in robustness, hallucination reduction, or safety (90-day OKR)
• Publish or present internal research findings that directly shape product reliability (90-day OKR)

Applied Research Scientist

Research Scientist, Computational Cognitive Scientist

Associate Research Scientist

Research Assistant – PRN

Research Assistant

Statistical Research Scientist I/II, Federal Research