Design, build, and manage end-to-end RAG pipelines for enterprise AI applications.
Lead preprocessing of unstructured data, including discovery, classification, cleansing, redaction, and metadata enrichment.
Develop and optimize document chunking, embedding, and vectorization strategies for structured and unstructured datasets.
Coordinate ingestion of curated datasets into vector databases and AI platforms.
Package curated unstructured datasets as governed, reusable data products for enterprise consumption.
Define and implement metadata tagging strategies to align with Collibra governance standards.
Partner with Data Governance and Data Quality teams to ensure AI-ready data meets enterprise standards for lineage, classification, and compliance.
Evaluate and optimize embedding models, retrieval strategies, and indexing performance.
Monitor and tune RAG pipeline performance, including latency, retrieval accuracy, and cost efficiency.
Implement automation for document ingestion, transformation, and publishing workflows.
Support integration with enterprise AI platforms (e.g., ChatGPT Enterprise, AskSage, Moveworks).
Conduct cost analysis and capacity planning for vector storage and processing workloads.
Provide technical guidance on AI data readiness and unstructured data lifecycle management.
Design, implement, and optimize enterprise-grade RAG and prompt engineering frameworks, including context engineering strategies (chunking, metadata enrichment, semantic filtering, dynamic context management) to improve retrieval accuracy, grounding, and response quality.
Develop and maintain scalable multi-modal data pipelines that ingest, preprocess, embed, and integrate text, documents, images, audio, and structured data into governed vectorized data products consumable by enterprise AI platforms.

Requirements

Bachelor’s degree in Computer Science, Data Engineering, AI/ML, or related field and 8+ years of relevant experience.
Hands-on experience designing and implementing RAG architectures in production environments.
Experience working with unstructured data (PDFs, documents, email, transcripts, images with OCR, etc.).
Strong proficiency in Python and experience with NLP/LLM frameworks (e.g., LangChain, LlamaIndex, Hugging Face, OpenAI APIs).
Experience with vector databases (e.g., Pinecone, Weaviate, FAISS, OpenSearch, Azure AI Search).
Experience implementing document chunking, embedding generation, and similarity search.
Understanding of metadata modeling and governance principles.
Experience building scalable data pipelines in cloud environments (AWS, Azure, or GCP).
Hands-on experience with prompt engineering, evaluation metrics, and context window optimization.
Strong understanding of multi-modal data processing and pipeline engineering.
Strong knowledge of API integration and microservices architecture.
US Citizenship is required.

Benefits

Competitive compensation
Health and Wellness programs
Income Protection
Paid Leave
Retirement

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

RAG architectureunstructured data processingPythonNLP frameworksdocument chunkingembedding generationvector databasesdata pipeline engineeringAPI integrationprompt engineering

Soft Skills

technical guidancecollaborationcommunicationleadershipproblem-solvinganalytical thinkingcapacity planningcost analysisdata governancedata quality assurance

Certifications

Bachelor’s degree in Computer ScienceBachelor’s degree in Data EngineeringBachelor’s degree in AI/ML