Salary
💰 $150,000 - $190,000 per year
Tech Stack
NumpyPandasPythonSQL
About the role
- Master our AI agent platform and understand complex financial compliance quality requirements
- Shadow experienced team members on LLM output evaluation and customer quality assessments
- Establish evaluation metrics for compliance use cases like sanctions screening and AML investigations
- Own end-to-end quality assessment frameworks for major bank and fintech AI deployments
- Build comprehensive evaluation dashboards and statistical analysis of LLM performance across customer environments
- Partner with engineering and product teams to establish quality guidelines and improvement recommendations
- Design and implement rigorous evaluation frameworks for LLM outputs in highly regulated environments
- Lead statistical analysis of AI performance trends, failure modes, and improvement opportunities across customer deployments
- Collaborate with Engineering, Customer, and Product teams as the AI quality expert
Requirements
- 3-5 years of industry experience in data science, AI evaluation, or machine learning with focus on quality assessment
- Bachelor's or Master's degree in Computer Science, Statistics, Mathematics, Physics, or other quantitative field
- Strong proficiency in Python and experience with statistical analysis libraries (pandas, numpy, scipy, matplotlib)
- Experience with LLM evaluation techniques, prompt analysis, and AI system quality assessment
- Strong statistical knowledge including experimental design, hypothesis testing, and performance measurement
- Experience with SQL, data visualization, and building dashboards for stakeholder communication
- Understanding of evaluation metrics, A/B testing, and statistical significance in AI system assessment
- Will work in person Monday through Friday in our SF office
- Legally authorized to work in the United States (or indicate sponsorship needs)