FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Member of Technical Staff – Evals
P-1 AIMember of Technical Staff responsible for evals to ensure AI learning and skill performance. Involves designing, implementing, and validating evaluation benchmarks for AI systems.
Tech Stack
Tools & technologiesPython
About the role
Key responsibilities & impact- Implement and operate the system for organizing, transforming, running, grading, and reporting on eval benchmarks.
- Design and execute the process by which we develop and QA our evals, incorporating contributions from our own engineering team, industrial partners, and subject-matter experts.
- Ensure that evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it.
- Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions.
- Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks.
Requirements
What you’ll need- Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others.
- Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations.
- Experience in developing, managing, and running evals against LLM-based systems is a strong plus.
- Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers).
- Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.).
- Ability to thrive in a fast-paced, dynamic startup environment.
Benefits
Comp & perks- healthcare
- dental
- vision insurance
- 401k with employer matching
- unlimited PTO
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Pythontest suitesmetrics designLLM-based systemsautomated testsquality challengesCI/CDsoftware development practicesperformance visualization
Soft Skills
communication skillsleadershipcollaborationadaptability