FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Member of Technical Staff – Data Scientist, Evals
PerplexityData Scientist specializing in LLM evaluations for Perplexity's products. Engineering automated pipelines and metrics to enhance the performance of answer quality across various platforms.
Posted 6/29/2026full-timeSan Francisco • California • 🇺🇸 United StatesLead💰 $200,000 - $300,000 per yearWebsite
Tech Stack
Tools & technologiesAWSCloudPythonSQL
About the role
Key responsibilities & impact- Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
- Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
- Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
- Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
- Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality
Requirements
What you’ll need- PhD or MS in a technical field or equivalent experience
- 4+ years of experience in data science or machine learning
- Strong proficiency in Python and SQL (expected to write production-grade code)
- Experience building within a modern cloud data stack, specifically AWS and Databricks
- Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster
- 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups (preferred)
- Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale (preferred)
- A strong research background, with experience applying research methods to real-world ML problems (preferred)
- Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets (preferred)
Benefits
Comp & perks- Full-time U.S. employees enjoy a comprehensive benefits program including equity, health, dental, vision, retirement, fitness, commuter and dependent care accounts, and more.
- Full-time employees outside the U.S. enjoy a comprehensive benefits program tailored to their region of residence.
- USD salary ranges apply only to U.S.-based positions. International salaries are set based on the local market.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Machine LearningAutomated Evaluation PipelinesEvaluation Metrics DefinitionGround Truth Dataset BuildingResearch Method Application
Soft Skills
CollaborationAdaptability
Certifications
PhD or MS in Technical Field