Data Scientist

Guild.ai

First Data Scientist at Guild.ai establishing the company’s truth layer for AI-native developer workflows. Partnering with engineering and design to define metrics and ensure product reliability.

Posted 4/12/2026full-timeSan Francisco • California • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

PythonSQL

About the role

Key responsibilities & impact

Define What “Good” Means: Partner with founders, engineering, and design to define product KPIs and quality metrics—especially around AI behaviors (helpfulness, correctness, reliability, latency, cost, user trust).
Build the Measurement Foundation: Establish event taxonomy, instrumentation standards, and core datasets. Ensure we can answer product questions quickly and confidently.
Create AI Evaluation & Monitoring Systems: Develop offline/online evaluation approaches for agentic workflows (e.g., golden sets, human review loops, heuristic + model-based scoring, regression detection, error taxonomies).
Run Experiments That Change Decisions: Design and analyze A/B tests and quasi-experiments; bring statistical rigor to iteration speed.
Turn Insight into Action: Produce analyses, narratives, and recommendations that directly shape roadmap tradeoffs and product direction.
Enable Self-Serve Analytics: Build dashboards and lightweight tooling that help the entire team understand usage, performance, and customer outcomes.
Be a Cross-Functional Glue Layer: Work tightly with engineering on logging/telemetry, with PM on prioritization, and with GTM/customer conversations to connect product behavior to real-world impact.
Define Data Science at Guild.ai: Establish best practices for metrics, experimentation, and decision-making frameworks that scale as the team grows.

Requirements

What you’ll need

Strong foundations in statistics, experimentation, and causal reasoning, with a track record of driving product decisions through data
Fluency in SQL and Python, and comfort working across the data stack (from raw events to analysis-ready datasets)
Experience building analytics and measurement systems in a fast-moving environment (startup and/or high-ownership teams)
Ability to translate ambiguous questions into well-scoped analyses and clear recommendations
High judgment and crisp communication—especially when data is incomplete or messy
A founder’s mentality: comfortable building from scratch, prioritizing ruthlessly, and owning outcomes end-to-end
Experience evaluating or monitoring LLMs / agentic systems (quality measurement, human-in-the-loop evals, regression testing, safety/reliability metrics)
Familiarity with developer tools, infrastructure, observability, or Git-based workflows
Comfort with modern data tooling (warehouses, dbt, orchestration, BI) and event-driven architectures
Experience establishing experimentation and analytics culture at an early-stage startup

Benefits

Comp & perks

Significant equity in an early-stage, venture-backed startup
Comprehensive Health Benefits (Medical, Dental, Vision)
Flexible PTO to ensure you have the time you need to recharge

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

statisticsexperimentationcausal reasoningSQLPythonanalytics systemsmeasurement systemsA/B testingdata analysisevent-driven architectures

Soft Skills

communicationjudgmentproblem-solvingprioritizationownershiptranslating ambiguitycollaborationanalytical thinkingdecision-makingnarrative production