Title Machine learning based narrative search in the information space /
Translation of Title Mašininio mokymosi metodais pagrįsta naratyvų paieška informacinėje erdvėje.
Authors Pranauskas, Gvidas
Full Text Download
Pages 71
Keywords [eng] topic modeling, natural language processing, large language models, dynamic topic modeling, BERTopic, ANTM, embeddings, clustering, algorithmic topic models. temų modeliavimas, natūralios kalbos apdorojimas, dideji kalbos modeliai, dinaminis temų modeliavimas, BERTopic, ANTM, įterpiniai, klasterizavimas, algoritminiai temų modeliai.
Abstract [eng] With the ever increasing amount of information available, the need for new methods to systematize and analyze this data becomes apparent. This research proposes and validates a dynamic topic modeling process for identifying and tracking time-varying narratives in textual datasets. The study analyzes the 2024 Lithuanian parliamentary election in a dataset which consists of 7385 documents collected from both digital and traditional media sources (TV, radio, and press). The analysis is conducted using two algorithmic topic modeling approaches: BERTopic and ANTM, which combine embeddings, dimensionality reduction, clustering and narrative extraction using Large Language Models (LLMs). The research also explores LLMs based evaluation metrics for measuring topic model performance. Key findings reveal that BERTopic excels in better maintaining temporal topic coherence across time steps, while ANTM produces more diverse topics but struggles with thematic consistency. This research show how advanced NLP techniques can systematically identify evolving narratives in complex information spaces, with potential applications for media professionals, political strategists or journalists in investigating the dynamics of public opinion.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language English
Publication date 2025