Distyl AI

Applied AI Researcher, Benchmarking

Distyl AI

full-time

Posted on:

Location Type: Hybrid

Location: San Francisco • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

React

About the role

  • The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity.
  • Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards.

Requirements

  • Experience Designing and Running Evaluations: You’ve built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance.
  • Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results.
  • Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.).
  • Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done.
  • Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow.
  • Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.
  • Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize.
Benefits
  • Competitive salary and benefits package, including equity options
  • medical/dental/vision covered at 100% for you and your dependents
  • 401K plan
  • perks such as commuter benefits and lunch provided in office

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
benchmarkingevaluation frameworksadversarial robustness testinglongitudinal performance trackinghuman-in-the-loop assessmentcompound AI systemsensemblingReActgraph-of-thoughtsdata analysis
Soft skills
statistical rigoranalytical rigorprototypingcommunicationresearch results presentationcollaborationproblem-solvingcreativityadaptabilitycustomer-focused
Distyl AI

Applied AI Researcher – System Self-Construction

Distyl AI
Mid · Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 11 hours agoSource: jobs.ashbyhq.com
React
Distyl AI

Applied AI Researcher – System Self-Improvement

Distyl AI
Mid · Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 11 hours agoSource: jobs.ashbyhq.com
Distyl AI

Applied AI Researcher – System Discovery

Distyl AI
Mid · Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 11 hours agoSource: jobs.ashbyhq.com
Distyl AI

Applied AI Researcher, AI Systems

Distyl AI
Mid · Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 11 hours agoSource: jobs.ashbyhq.com