| Title |
Hybrid retrieval methods and an attention mechanism in medical retrieval-augmented generation systems |
| Translation of Title |
Hibridiniai paieškos metodai ir dėmesio sutelkimo mechanizmas medicininėje informacijos paieškos sistemoje. |
| Authors |
Braun, Ugnius Byron |
| Full Text |
|
| Pages |
124 |
| Keywords [eng] |
Paieška papildytas teksto generavimas, dideli kalbos modeliai, vektorinė duomenų bazė, semantinis panašumas (angl.) Retrieval-Augmented Generation, Large Language Models, Vector Database, Semantic Similarity |
| Abstract [eng] |
This thesis evaluates information retrieval methods for medical document collections in a retrievalaugmented generation (RAG) setting. Clinical guideline documents were preprocessed to extract text, tables and images into a unified textual format, and multiple retrieval strategies were implemented and compared. Experimental results showed that dense semantic retrieval alone provides limited accuracy (33%), while subunitbased scoring improves early ranks (42%) and hybrid semanticlexical retrieval achieves the strongest performance (88%). Native sparse attention reranking did not yield meaningful improvements, when trained with a small subset of questionanswer pairs (35%). |
| Dissertation Institution |
Vilniaus universitetas. |
| Type |
Master thesis |
| Language |
English |
| Publication date |
2026 |