Guild.ai

Data Scientist

Guild.ai

full-time

Posted on:

Location Type: Hybrid

Location: San FranciscoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Define What “Good” Means: Partner with founders, engineering, and design to define product KPIs and quality metrics—especially around AI behaviors (helpfulness, correctness, reliability, latency, cost, user trust).
  • Build the Measurement Foundation: Establish event taxonomy, instrumentation standards, and core datasets. Ensure we can answer product questions quickly and confidently.
  • Create AI Evaluation & Monitoring Systems: Develop offline/online evaluation approaches for agentic workflows (e.g., golden sets, human review loops, heuristic + model-based scoring, regression detection, error taxonomies).
  • Run Experiments That Change Decisions: Design and analyze A/B tests and quasi-experiments; bring statistical rigor to iteration speed.
  • Turn Insight into Action: Produce analyses, narratives, and recommendations that directly shape roadmap tradeoffs and product direction.
  • Enable Self-Serve Analytics: Build dashboards and lightweight tooling that help the entire team understand usage, performance, and customer outcomes.
  • Be a Cross-Functional Glue Layer: Work tightly with engineering on logging/telemetry, with PM on prioritization, and with GTM/customer conversations to connect product behavior to real-world impact.
  • Define Data Science at Guild.ai: Establish best practices for metrics, experimentation, and decision-making frameworks that scale as the team grows.

Requirements

  • Strong foundations in statistics, experimentation, and causal reasoning, with a track record of driving product decisions through data
  • Fluency in SQL and Python, and comfort working across the data stack (from raw events to analysis-ready datasets)
  • Experience building analytics and measurement systems in a fast-moving environment (startup and/or high-ownership teams)
  • Ability to translate ambiguous questions into well-scoped analyses and clear recommendations
  • High judgment and crisp communication—especially when data is incomplete or messy
  • A founder’s mentality: comfortable building from scratch, prioritizing ruthlessly, and owning outcomes end-to-end
  • Experience evaluating or monitoring LLMs / agentic systems (quality measurement, human-in-the-loop evals, regression testing, safety/reliability metrics)
  • Familiarity with developer tools, infrastructure, observability, or Git-based workflows
  • Comfort with modern data tooling (warehouses, dbt, orchestration, BI) and event-driven architectures
  • Experience establishing experimentation and analytics culture at an early-stage startup
Benefits
  • Significant equity in an early-stage, venture-backed startup
  • Comprehensive Health Benefits (Medical, Dental, Vision)
  • Flexible PTO to ensure you have the time you need to recharge
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
statisticsexperimentationcausal reasoningSQLPythonanalytics systemsmeasurement systemsA/B testingdata analysisevent-driven architectures
Soft Skills
communicationjudgmentproblem-solvingprioritizationownershiptranslating ambiguitycollaborationanalytical thinkingdecision-makingnarrative production