FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Research Engineer – Evals
FirecrawlResearch Engineer building evaluation systems that measure Firecrawl's data extraction quality. Designing benchmarks and feedback loops to improve model performance and output quality.
Posted 5/13/2026full-timeSan Francisco • California • 🇺🇸 United StatesMid-LevelSenior💰 $160,000 - $240,000 per yearWebsite
About the role
Key responsibilities & impact- Build the eval stack from scratch
- Design and own the systems that measure whether Firecrawl's outputs are actually good — across scrape, crawl, extract, and map
- Design benchmarks that reflect reality
- Own LLM-as-judge pipelines
- Close the loop with models and RL
- Run fast experiments and communicate clearly
Requirements
What you’ll need- 3+ years in ML engineering, applied AI, or data quality — with production systems
- Builds their own eval infrastructure
- Knows what 'good' means for unstructured web data
- Fluent in LLM evaluation methodology
- Production-minded
- Fast and clear
- Backgrounds that tend to do well: ML engineers who've built eval or data quality systems at AI labs or applied teams. Engineers who've worked on LLM fine-tuning or RLHF pipelines and understand how feedback quality drives model improvement. People who've worked at the intersection of data infrastructure and model development. Anyone who's been the person on the team asking 'but how do we know this actually works?'
Benefits
Comp & perks- Salary that makes sense — $140,000-180,000/year (U.S.-based), based on impact, not tenure
- Own a piece — Up to 0.15% equity in what you're helping build
- Unlimited PTO — Minimum 3 weeks off encouraged; take the time you need to recharge
- Parental leave — 12 weeks fully paid, for moms and dads
- Wellness stipend — $100/month for the gym, therapy, massages, or whatever keeps you human
- Learning & Development - Expense up to $150/year toward anything that helps you grow professionally
- Team offsites — A change of scenery, minus the trust falls
- Sabbatical — 3 paid months off after 4 years, do something fun and new
- Full coverage, no red tape — Medical, dental, and vision (100% for employees, 50% for spouse/kids) — no weird loopholes, just care that works
- Life & Disability insurance — Employer-paid short-term disability, long-term disability, and life insurance — coverage for life's curveballs
- Supplemental options — Optional accident, critical illness, hospital indemnity, and voluntary life insurance for extra peace of mind
- Doctegrity telehealth — Talk to a doctor from your couch
- 401(k) plan — Retirement might be a ways off, but future-you will thank you
- Pre-tax benefits — Access to FSAs and commuter benefits to help your wallet out a bit
- Pet insurance — Because fur babies are family too
- SF HQ perks — Snacks, drinks, team lunches, and the occasional burst of chaotic startup energy
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
ML engineeringapplied AIdata qualityeval infrastructureLLM evaluation methodologyLLM fine-tuningRLHF pipelinesdata infrastructuremodel development
Soft Skills
production-mindedfast communicationclear communication