Cosine similarity filtering for reliable RAG pipelines: Advancing context-aware tutoring systems in education

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Song, Lei
Mehr, Farnaz
Sharifzadeh, Hamid

Author ORCID Profiles (clickable)

Degree

Grantor

Date

2025-12

Supervisors

Type

Conference Contribution - Oral Presentation

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

textbooks
ChatGPT-RAG
retrieval-augmented generation (RAG)
semantic computing
large language models
AI hallucinations
AI in learning and assessment
AI in education
artificial intelligence (AI)
digital literacy

Citation

Song, L., Mehr, F., & Sharifzadeh, H. (2025, December, 1-5) Cosine similarity filtering for reliable RAG pipelines: Advancing context-aware tutoring systems in education [Paper presentation]. ITP Rangahau & Research Symposium 2025 + OPSITARA 2025, New Zealand https://hdl.handle.net/10652/7121

Abstract

This research aims to improve the learner's consistency in responding to the educational AI system. Specifically, it investigates hallucination responses in current large language models (LLMs) from embedding textbooks. The objective is to build a cosine similarity filter for reliable Retrieval-Augmented Generation (RAG) pipelines in an educational AI system that supports learners with different academic backgrounds. RAG embeds knowledge into LLM by chunking documents and normalising the pieces of text to structured data. Well-known AI suppliers have used overlapping chunking, semantic chunking, recursive character text splitting, and header-based chunking. However, current feedback indicates poor consistency from RAG-assisted models. Addressing this issue will directly impact the structured data in RAG pipelines to improve the understanding of the textbook. With educator-defined terminology, we can apply a cosine similarity filter to reduce noise after chunking in the RAG pipeline. An experiment was conducted with 10 web development textbooks. The educators created 30 questions to evaluate certain, contextualised, and open-ended scenarios for the answers. The similarities between the responses and model answers are evaluated by the sentence-transformer/all-MiniLM-L6-v2 model. The current results show that using cosine similarity filtering in ChatGPT-RAG improves similarity to 0.79, about 0.09 higher than without filtering. This confirms that existing chunking methods affect consistency. Ongoing trials with cosine similarity filtering RAG are expected to outperform ChatGPT and ChatGPT-RAG. The https://cogniti.ai/ will also be involved. This research improves the responses of educational AI systems by employing the cosine similarity filter to structure embedded textbook data. The findings highlight that experts' knowledge with a cosine similarity filter reduces the noise after RAG chunking. It also allows us to alter the filter to the knowledge graph and the BERTScore. As a result, the system can provide more consistent responses to learners from diverse backgrounds.

Publisher

Link to ePress publication

DOI

Copyright holder

Authors

Copyright notice

All rights reserved

Copyright license

Available online at