Salary
💰 $120,000 - $150,250 per year
Tech Stack
PythonSaltStackSpring
About the role
- Own and evolve the evaluation frameworks for our AI and ML models, translating high-level trust principles into specific, measurable tests.
- Define and conduct rigorous experiments to resolve ambiguous questions about the safety, reliability, and impact of our models.
- Own and conduct critical analyses and experiments to measure the real-world impact of our AI, helping to define and validate that our systems are technically robust, trustworthy, and outcome-driven.
- Collaborate with engineering partners to design and build production-quality code, creating automated, scalable, and pragmatic testing frameworks based on modern best practices.
- Partner with product, legal, and infrastructure teams to implement and monitor standards for trustworthy AI.
- Proactively identify gaps and develop novel evaluation approaches, which may include creating synthetic test data from user traces or building lightweight processes for non-technical partners to iterate on test sets.
- Synthesize complex evaluation results and industry trends into actionable insights and clearly communicate findings to diverse technical and non-technical stakeholders.
Requirements
- 2-3 + years of relevant industry experience in data science, machine learning, or a related field.
- Proficiency in Python and a solid understanding of core statistical concepts.
- Proven ability to write and review production-quality code.
- Proven experience in evaluating machine learning models; exposure to large language models (LLMs) is a strong plus.
- Hands-on experience in one or more of the following: analyzing A/B tests or other experiments with statistical rigor; using evaluation tools (e.g., LangSmith, open-source libraries) to iteratively measure and improve model performance; building data pipelines or tools to enable collaboration on test sets.
- Applied knowledge of concepts in AI ethics, such as fairness, bias, and interpretability.
- A strong interest in applying data science to complex, high-stakes domains like mental healthcare.
- A pragmatic and proactive approach to problem-solving, with a history of developing creative solutions to complex problems.
- Exceptional communication and teamwork skills, with a proven ability to collaborate effectively with diverse, cross-functional teams.
- An avid learning mindset and a passion for staying at the forefront of trends in AI safety, evaluation, and reliability.
- Must be based in the Salt Lake City metro area and be willing to commute 2-3 days a week when this role transitions to a hybrid schedule in 2026.