Title Hybrid retrieval methods and an attention mechanism in medical retrieval-augmented generation systems
Translation of Title Hibridiniai paieškos metodai ir dėmesio sutelkimo mechanizmas medicininėje informacijos paieškos sistemoje.
Authors Braun, Ugnius Byron
Full Text Download
Pages 124
Keywords [eng] Paieška papildytas teksto generavimas, dideli kalbos modeliai, vektorinė duomenų bazė, semantinis panašumas (angl.) Retrieval-Augmented Generation, Large Language Models, Vector Database, Semantic Similarity
Abstract [eng] This thesis evaluates information retrieval methods for medical document collections in a retrieval­augmented generation (RAG) setting. Clinical guideline documents were preprocessed to extract text, tables and images into a unified textual format, and multiple retrieval strategies were implemented and compared. Experimental results showed that dense semantic retrieval alone provides limited accuracy (33%), while subunit­based scoring improves early ranks (42%) and hybrid semantic­lexical retrieval achieves the strongest performance (88%). Native sparse attention reranking did not yield meaningful improvements, when trained with a small subset of question­answer pairs (35%).
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language English
Publication date 2026