Cosine similarity filtering for reliable RAG pipelines: Advancing context-aware tutoring systems in education
Loading...
Supplementary material
Other Title
Authors
Song, Lei
Mehr, Farnaz
Sharifzadeh, Hamid
Mehr, Farnaz
Sharifzadeh, Hamid
Author ORCID Profiles (clickable)
Degree
Grantor
Date
2025-12
Supervisors
Type
Conference Contribution - Oral Presentation
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
textbooks
ChatGPT-RAG
retrieval-augmented generation (RAG)
semantic computing
large language models
AI hallucinations
AI in learning and assessment
AI in education
artificial intelligence (AI)
digital literacy
ChatGPT-RAG
retrieval-augmented generation (RAG)
semantic computing
large language models
AI hallucinations
AI in learning and assessment
AI in education
artificial intelligence (AI)
digital literacy
ANZSRC Field of Research Code (2020)
Citation
Song, L., Mehr, F., & Sharifzadeh, H. (2025, December, 1-5) Cosine similarity filtering for reliable RAG pipelines: Advancing context-aware tutoring systems in education [Paper presentation]. ITP Rangahau & Research Symposium 2025 + OPSITARA 2025, New Zealand
https://hdl.handle.net/10652/7121
Abstract
This research aims to improve the learner's consistency in responding to the educational AI system. Specifically, it investigates hallucination responses in current large language models (LLMs) from embedding textbooks. The objective is to build a cosine similarity filter for reliable Retrieval-Augmented Generation (RAG) pipelines in an educational AI system that supports learners with different academic backgrounds.
RAG embeds knowledge into LLM by chunking documents and normalising the pieces of text to structured data. Well-known AI suppliers have used overlapping chunking, semantic chunking, recursive character text splitting, and header-based chunking. However, current feedback indicates poor consistency from RAG-assisted models. Addressing this issue will directly impact the structured data in RAG pipelines to improve the understanding of the textbook.
With educator-defined terminology, we can apply a cosine similarity filter to reduce noise after chunking in the RAG pipeline. An experiment was conducted with 10 web development textbooks. The educators created 30 questions to evaluate certain, contextualised, and open-ended scenarios for the answers. The similarities between the responses and model answers are evaluated by the sentence-transformer/all-MiniLM-L6-v2 model. The current results show that using cosine similarity filtering in ChatGPT-RAG improves similarity to 0.79, about 0.09 higher than without filtering. This confirms that existing chunking methods affect consistency. Ongoing trials with cosine similarity filtering RAG are expected to outperform ChatGPT and ChatGPT-RAG. The https://cogniti.ai/ will also be involved. This research improves the responses of educational AI systems by employing the cosine similarity filter to structure embedded textbook data. The findings highlight that experts' knowledge with a cosine similarity filter reduces the noise after RAG chunking. It also allows us to alter the filter to the knowledge graph and the BERTScore. As a result, the system can provide more consistent responses to learners from diverse backgrounds.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Authors
Copyright notice
All rights reserved
