AI Knowledge Data Engineer

iBusiness Funding

full-time

Posted on: 12/14/2025

Location Type: Remote

Location: Remote • Florida • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $180,000 - $240,000 per year

Job Level

Mid-LevelSenior

Tech Stack

AirflowElasticSearchPythonPyTorchSparkTensorflow

About the role

Architect, implement, and optimize retrieval-augmented generation (RAG) workflows by integrating local LLMs (e.g., Llama) with retrieval mechanisms (vector search, Elasticsearch, FAISS, Weaviate)
Design, build, and maintain scalable data pipelines for ingesting, transforming, indexing, and retrieving structured and unstructured data from diverse sources
Design, build, and scale addressable services and tools specifications that can be leveraged by LLMs and Agents to orchestrate workflows
Orchestrate and scale training data operations, including data curation, versioning, and lineage tracking for large-scale LLM training and fine-tuning
Develop and maintain ontologies, knowledge graphs, and semantic data models to structure and integrate domain-specific knowledge for improved retrieval and reasoning
Implement and optimize knowledge retrieval strategies (dense/sparse retrieval, ranking algorithms) to maximize system accuracy and relevance
Aggregate disparate knowledge bases and heterogeneous data into a fused approach for access to relevant contextual information
Design cognitive memory systems for AI agents, enabling persistent knowledge retention and contextual awareness across interactions
Collaborate with AI researchers, data scientists, and engineers to align knowledge architecture with business objectives and ensure data quality
Evaluate and integrate new technologies and research advancements in LLMs, RAG, information retrieval, and knowledge representation
Maintain clear and comprehensive documentation of models, pipelines, and workflows.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field
Proven experience designing and scaling data pipelines and training data workflows for LLMs or similar AI systems
Strong background in information retrieval systems, vector search technologies, and RAG frameworks (e.g., FAISS, Pinecone, Elasticsearch, Milvus)
Proficiency in programming (Python) and machine learning libraries (TensorFlow, PyTorch)
Experience with ontologies, knowledge graphs, and semantic technologies (RDF, OWL, SPARQL)
Familiarity with distributed data processing and orchestration tools (e.g., Spark, Airflow, Kubeflow)
Excellent analytical, problem-solving, and communication skills
Ability to work collaboratively in a cross-functional, fast-paced environment.

Benefits

medical, dental, and vision coverage
401(k) with company match
paid time off

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

retrieval-augmented generationdata pipelinesinformation retrievalvector searchknowledge graphsontologiesmachine learningprogrammingPythonsemantic technologies

Soft skills

analytical skillsproblem-solvingcommunication skillscollaborationadaptability