RAG and Evaluation Engineer

LTS

RAG & Evaluation Engineer at LTS leveraging AI for legacy system modernization. Focused on retrieval quality and evaluation harness within a senior engineering team.

Posted 6/12/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

software engineeringretrieval pipelineschunkingembeddinghybrid retrievalrerankingbenchmark designregression detectionTypeScriptPython

Soft Skills

collaborativedetail-orientedevaluation skillsability to define metricspatient

Tools & Technologies

AI toolingproduction RAG system

Industry Keywords

ingestion pipelinestranslation accuracydependency-map correctnessA/B testingLLM-as-judge patterns

Tech Stack

Tools & technologies

PythonTypeScript

About the role

Key responsibilities & impact

Own the knowledge surface — ingestion pipelines for source code, structured metadata, technical documentation, patches, and additional corpora the customer provides.
Own retrieval quality — chunking, embeddings, hybrid retrieval, reranking, and freshness.
Own the eval harness — benchmarks for translation accuracy, dependency-map correctness, and overall agent quality.
Run A/B testing and regression detection across prompts, retrieval, and model changes.
Operate the feedback loop from production usage back into evals and retrieval.
Define what “good” means for the platform when no one else has a clear view, so the team can tell whether the agent is actually improving.
Pair with the Agent Engineers on the prompt-and-eval iteration cycle.

Requirements

What you’ll need

Bachelor’s degree in Computer Science, Engineering, Information Science, or a related field, plus 4 years of professional software engineering experience; equivalent experience may substitute for the degree requirement.
Has shipped a production RAG system with quality the candidate can describe in numbers (rigor matters more than scale).
Ability to work in a fast-paced, collaborative environment.
Production experience with retrieval pipelines — ingestion, chunking, embedding, hybrid retrieval, reranking.
Strong applied evaluation skills — benchmark design, regression detection, LLM-as-judge patterns.
Knows when BM25 beats embeddings and when neither is enough.
Measures everything they ship; opinions about chunking are backed by benchmarks.
Patient with detail; comfortable defining metrics before the team has agreed on them.
Heavy native use of AI tooling: agents in parallel, model as collaborator.
Strong TypeScript or Python.
Demonstrated experience in a remote work environment.

Benefits

Comp & perks

The opportunity to support high visibility federal missions in IT and healthcare
A culture that values innovation, growth, collaboration, and quality
Access to cutting-edge tools and technologies
Comprehensive benefits for you and your family
A career path that rewards ambition and performance