
Data Scientist
Guild.ai
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
About the role
- Define What “Good” Means: Partner with founders, engineering, and design to define product KPIs and quality metrics—especially around AI behaviors (helpfulness, correctness, reliability, latency, cost, user trust).
- Build the Measurement Foundation: Establish event taxonomy, instrumentation standards, and core datasets. Ensure we can answer product questions quickly and confidently.
- Create AI Evaluation & Monitoring Systems: Develop offline/online evaluation approaches for agentic workflows (e.g., golden sets, human review loops, heuristic + model-based scoring, regression detection, error taxonomies).
- Run Experiments That Change Decisions: Design and analyze A/B tests and quasi-experiments; bring statistical rigor to iteration speed.
- Turn Insight into Action: Produce analyses, narratives, and recommendations that directly shape roadmap tradeoffs and product direction.
- Enable Self-Serve Analytics: Build dashboards and lightweight tooling that help the entire team understand usage, performance, and customer outcomes.
- Be a Cross-Functional Glue Layer: Work tightly with engineering on logging/telemetry, with PM on prioritization, and with GTM/customer conversations to connect product behavior to real-world impact.
- Define Data Science at Guild.ai: Establish best practices for metrics, experimentation, and decision-making frameworks that scale as the team grows.
Requirements
- Strong foundations in statistics, experimentation, and causal reasoning, with a track record of driving product decisions through data
- Fluency in SQL and Python, and comfort working across the data stack (from raw events to analysis-ready datasets)
- Experience building analytics and measurement systems in a fast-moving environment (startup and/or high-ownership teams)
- Ability to translate ambiguous questions into well-scoped analyses and clear recommendations
- High judgment and crisp communication—especially when data is incomplete or messy
- A founder’s mentality: comfortable building from scratch, prioritizing ruthlessly, and owning outcomes end-to-end
- Experience evaluating or monitoring LLMs / agentic systems (quality measurement, human-in-the-loop evals, regression testing, safety/reliability metrics)
- Familiarity with developer tools, infrastructure, observability, or Git-based workflows
- Comfort with modern data tooling (warehouses, dbt, orchestration, BI) and event-driven architectures
- Experience establishing experimentation and analytics culture at an early-stage startup
Benefits
- Significant equity in an early-stage, venture-backed startup
- Comprehensive Health Benefits (Medical, Dental, Vision)
- Flexible PTO to ensure you have the time you need to recharge
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
statisticsexperimentationcausal reasoningSQLPythonanalytics systemsmeasurement systemsA/B testingdata analysisevent-driven architectures
Soft Skills
communicationjudgmentproblem-solvingprioritizationownershiptranslating ambiguitycollaborationanalytical thinkingdecision-makingnarrative production