Leidos

Unstructured Data Engineer

Leidos

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $107,900 - $195,050 per year

Job Level

About the role

  • Design, build, and manage end-to-end RAG pipelines for enterprise AI applications.
  • Lead preprocessing of unstructured data, including discovery, classification, cleansing, redaction, and metadata enrichment.
  • Develop and optimize document chunking, embedding, and vectorization strategies for structured and unstructured datasets.
  • Coordinate ingestion of curated datasets into vector databases and AI platforms.
  • Package curated unstructured datasets as governed, reusable data products for enterprise consumption.
  • Define and implement metadata tagging strategies to align with Collibra governance standards.
  • Partner with Data Governance and Data Quality teams to ensure AI-ready data meets enterprise standards for lineage, classification, and compliance.
  • Evaluate and optimize embedding models, retrieval strategies, and indexing performance.
  • Monitor and tune RAG pipeline performance, including latency, retrieval accuracy, and cost efficiency.
  • Implement automation for document ingestion, transformation, and publishing workflows.
  • Support integration with enterprise AI platforms (e.g., ChatGPT Enterprise, AskSage, Moveworks).
  • Conduct cost analysis and capacity planning for vector storage and processing workloads.
  • Provide technical guidance on AI data readiness and unstructured data lifecycle management.
  • Design, implement, and optimize enterprise-grade RAG and prompt engineering frameworks, including context engineering strategies (chunking, metadata enrichment, semantic filtering, dynamic context management) to improve retrieval accuracy, grounding, and response quality.
  • Develop and maintain scalable multi-modal data pipelines that ingest, preprocess, embed, and integrate text, documents, images, audio, and structured data into governed vectorized data products consumable by enterprise AI platforms.

Requirements

  • Bachelor’s degree in Computer Science, Data Engineering, AI/ML, or related field and 8+ years of relevant experience.
  • Hands-on experience designing and implementing RAG architectures in production environments.
  • Experience working with unstructured data (PDFs, documents, email, transcripts, images with OCR, etc.).
  • Strong proficiency in Python and experience with NLP/LLM frameworks (e.g., LangChain, LlamaIndex, Hugging Face, OpenAI APIs).
  • Experience with vector databases (e.g., Pinecone, Weaviate, FAISS, OpenSearch, Azure AI Search).
  • Experience implementing document chunking, embedding generation, and similarity search.
  • Understanding of metadata modeling and governance principles.
  • Experience building scalable data pipelines in cloud environments (AWS, Azure, or GCP).
  • Hands-on experience with prompt engineering, evaluation metrics, and context window optimization.
  • Strong understanding of multi-modal data processing and pipeline engineering.
  • Strong knowledge of API integration and microservices architecture.
  • US Citizenship is required.
Benefits
  • Competitive compensation
  • Health and Wellness programs
  • Income Protection
  • Paid Leave
  • Retirement
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
RAG architectureunstructured data processingPythonNLP frameworksdocument chunkingembedding generationvector databasesdata pipeline engineeringAPI integrationprompt engineering
Soft Skills
technical guidancecollaborationcommunicationleadershipproblem-solvinganalytical thinkingcapacity planningcost analysisdata governancedata quality assurance
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Data EngineeringBachelor’s degree in AI/ML