
AI Knowledge Data Engineer
iBusiness Funding
full-time
Posted on:
Location Type: Remote
Location: Remote • Florida • 🇺🇸 United States
Visit company websiteSalary
💰 $180,000 - $240,000 per year
Job Level
Mid-LevelSenior
Tech Stack
AirflowElasticSearchPythonPyTorchSparkTensorflow
About the role
- Architect, implement, and optimize retrieval-augmented generation (RAG) workflows by integrating local LLMs (e.g., Llama) with retrieval mechanisms (vector search, Elasticsearch, FAISS, Weaviate)
- Design, build, and maintain scalable data pipelines for ingesting, transforming, indexing, and retrieving structured and unstructured data from diverse sources
- Design, build, and scale addressable services and tools specifications that can be leveraged by LLMs and Agents to orchestrate workflows
- Orchestrate and scale training data operations, including data curation, versioning, and lineage tracking for large-scale LLM training and fine-tuning
- Develop and maintain ontologies, knowledge graphs, and semantic data models to structure and integrate domain-specific knowledge for improved retrieval and reasoning
- Implement and optimize knowledge retrieval strategies (dense/sparse retrieval, ranking algorithms) to maximize system accuracy and relevance
- Aggregate disparate knowledge bases and heterogeneous data into a fused approach for access to relevant contextual information
- Design cognitive memory systems for AI agents, enabling persistent knowledge retention and contextual awareness across interactions
- Collaborate with AI researchers, data scientists, and engineers to align knowledge architecture with business objectives and ensure data quality
- Evaluate and integrate new technologies and research advancements in LLMs, RAG, information retrieval, and knowledge representation
- Maintain clear and comprehensive documentation of models, pipelines, and workflows.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field
- Proven experience designing and scaling data pipelines and training data workflows for LLMs or similar AI systems
- Strong background in information retrieval systems, vector search technologies, and RAG frameworks (e.g., FAISS, Pinecone, Elasticsearch, Milvus)
- Proficiency in programming (Python) and machine learning libraries (TensorFlow, PyTorch)
- Experience with ontologies, knowledge graphs, and semantic technologies (RDF, OWL, SPARQL)
- Familiarity with distributed data processing and orchestration tools (e.g., Spark, Airflow, Kubeflow)
- Excellent analytical, problem-solving, and communication skills
- Ability to work collaboratively in a cross-functional, fast-paced environment.
Benefits
- medical, dental, and vision coverage
- 401(k) with company match
- paid time off
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
retrieval-augmented generationdata pipelinesinformation retrievalvector searchknowledge graphsontologiesmachine learningprogrammingPythonsemantic technologies
Soft skills
analytical skillsproblem-solvingcommunication skillscollaborationadaptability