Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
P-1 AI

Member of Technical Staff – Evals

P-1 AI

Member of Technical Staff responsible for evals to ensure AI learning and skill performance. Involves designing, implementing, and validating evaluation benchmarks for AI systems.

Posted 5/11/2026full-timeCalifornia • 🇺🇸 United StatesLead💰 $170,000 - $200,000 per yearWebsite

Tech Stack

Tools & technologies
Python

About the role

Key responsibilities & impact
  • Implement and operate the system for organizing, transforming, running, grading, and reporting on eval benchmarks.
  • Design and execute the process by which we develop and QA our evals, incorporating contributions from our own engineering team, industrial partners, and subject-matter experts.
  • Ensure that evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it.
  • Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions.
  • Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks.

Requirements

What you’ll need
  • Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others.
  • Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations.
  • Experience in developing, managing, and running evals against LLM-based systems is a strong plus.
  • Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers).
  • Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.).
  • Ability to thrive in a fast-paced, dynamic startup environment.

Benefits

Comp & perks
  • healthcare
  • dental
  • vision insurance
  • 401k with employer matching
  • unlimited PTO

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Pythontest suitesmetrics designLLM-based systemsautomated testsquality challengesCI/CDsoftware development practicesperformance visualization
Soft Skills
communication skillsleadershipcollaborationadaptability