Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
LTS

RAG and Evaluation Engineer

LTS

RAG & Evaluation Engineer at LTS leveraging AI for legacy system modernization. Focused on retrieval quality and evaluation harness within a senior engineering team.

Posted 6/12/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSeniorWebsite

Tech Stack

Tools & technologies
PythonTypeScript

About the role

Key responsibilities & impact
  • Own the knowledge surface — ingestion pipelines for source code, structured metadata, technical documentation, patches, and additional corpora the customer provides.
  • Own retrieval quality — chunking, embeddings, hybrid retrieval, reranking, and freshness.
  • Own the eval harness — benchmarks for translation accuracy, dependency-map correctness, and overall agent quality.
  • Run A/B testing and regression detection across prompts, retrieval, and model changes.
  • Operate the feedback loop from production usage back into evals and retrieval.
  • Define what “good” means for the platform when no one else has a clear view, so the team can tell whether the agent is actually improving.
  • Pair with the Agent Engineers on the prompt-and-eval iteration cycle.

Requirements

What you’ll need
  • Bachelor’s degree in Computer Science, Engineering, Information Science, or a related field, plus 4 years of professional software engineering experience; equivalent experience may substitute for the degree requirement.
  • Has shipped a production RAG system with quality the candidate can describe in numbers (rigor matters more than scale).
  • Ability to work in a fast-paced, collaborative environment.
  • Production experience with retrieval pipelines — ingestion, chunking, embedding, hybrid retrieval, reranking.
  • Strong applied evaluation skills — benchmark design, regression detection, LLM-as-judge patterns.
  • Knows when BM25 beats embeddings and when neither is enough.
  • Measures everything they ship; opinions about chunking are backed by benchmarks.
  • Patient with detail; comfortable defining metrics before the team has agreed on them.
  • Heavy native use of AI tooling: agents in parallel, model as collaborator.
  • Strong TypeScript or Python.
  • Demonstrated experience in a remote work environment.

Benefits

Comp & perks
  • The opportunity to support high visibility federal missions in IT and healthcare
  • A culture that values innovation, growth, collaboration, and quality
  • Access to cutting-edge tools and technologies
  • Comprehensive benefits for you and your family
  • A career path that rewards ambition and performance

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
software engineeringretrieval pipelineschunkingembeddinghybrid retrievalrerankingbenchmark designregression detectionTypeScriptPython
Soft Skills
collaborativedetail-orientedevaluation skillsability to define metricspatient