
Unstructured Data Engineer
Leidos
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $107,900 - $195,050 per year
About the role
- Design, build, and manage end-to-end RAG pipelines for enterprise AI applications.
- Lead preprocessing of unstructured data, including discovery, classification, cleansing, redaction, and metadata enrichment.
- Develop and optimize document chunking, embedding, and vectorization strategies for structured and unstructured datasets.
- Coordinate ingestion of curated datasets into vector databases and AI platforms.
- Package curated unstructured datasets as governed, reusable data products for enterprise consumption.
- Define and implement metadata tagging strategies to align with Collibra governance standards.
- Partner with Data Governance and Data Quality teams to ensure AI-ready data meets enterprise standards for lineage, classification, and compliance.
- Evaluate and optimize embedding models, retrieval strategies, and indexing performance.
- Monitor and tune RAG pipeline performance, including latency, retrieval accuracy, and cost efficiency.
- Implement automation for document ingestion, transformation, and publishing workflows.
- Support integration with enterprise AI platforms (e.g., ChatGPT Enterprise, AskSage, Moveworks).
- Conduct cost analysis and capacity planning for vector storage and processing workloads.
- Provide technical guidance on AI data readiness and unstructured data lifecycle management.
- Design, implement, and optimize enterprise-grade RAG and prompt engineering frameworks, including context engineering strategies (chunking, metadata enrichment, semantic filtering, dynamic context management) to improve retrieval accuracy, grounding, and response quality.
- Develop and maintain scalable multi-modal data pipelines that ingest, preprocess, embed, and integrate text, documents, images, audio, and structured data into governed vectorized data products consumable by enterprise AI platforms.
Requirements
- Bachelor’s degree in Computer Science, Data Engineering, AI/ML, or related field and 8+ years of relevant experience.
- Hands-on experience designing and implementing RAG architectures in production environments.
- Experience working with unstructured data (PDFs, documents, email, transcripts, images with OCR, etc.).
- Strong proficiency in Python and experience with NLP/LLM frameworks (e.g., LangChain, LlamaIndex, Hugging Face, OpenAI APIs).
- Experience with vector databases (e.g., Pinecone, Weaviate, FAISS, OpenSearch, Azure AI Search).
- Experience implementing document chunking, embedding generation, and similarity search.
- Understanding of metadata modeling and governance principles.
- Experience building scalable data pipelines in cloud environments (AWS, Azure, or GCP).
- Hands-on experience with prompt engineering, evaluation metrics, and context window optimization.
- Strong understanding of multi-modal data processing and pipeline engineering.
- Strong knowledge of API integration and microservices architecture.
- US Citizenship is required.
Benefits
- Competitive compensation
- Health and Wellness programs
- Income Protection
- Paid Leave
- Retirement
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
RAG architectureunstructured data processingPythonNLP frameworksdocument chunkingembedding generationvector databasesdata pipeline engineeringAPI integrationprompt engineering
Soft Skills
technical guidancecollaborationcommunicationleadershipproblem-solvinganalytical thinkingcapacity planningcost analysisdata governancedata quality assurance
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Data EngineeringBachelor’s degree in AI/ML