Title Automatizuotas gramatinių struktūrų mažinimas lietuvių kalboje /
Translation of Title Automated text summarization for lithuanian language.
Authors Jaloveckienė, Indrė
Full Text Download
Pages 44
Abstract [eng] In the following work we present a method for automated text summarization for Lithuanian language. The main goal here is to create a method and later on a program for automating text’s summarization. As opposed to most currently existing summarization methods, we aim to create a method that would generate an abstractive summary rather than extractive. As input for the programwewillhaveatextinlithuanianandasanoutputweexpecttogetaveryshortbutprecise text, that would be as close to the human written summary as possible. Firstly, we did a research on existing summarization methods and pointed out the weakest and the strongest points of each. Then we created a theoretical model for generating lithuanian summary from a text. What is new in our method is that we included syntactic analysis of sentences, not only statistical analysis. With this addition we expected to get a summary that is much more precise and closer to human written text. In further sections, we have described the implementation of our model. We did text preprocessing in Python programming language as it already has quite a lot of built in Natural language processing tools. Later on all data is imported to a Microsoft SQL Server database and the rest of analysis is done there. Tosumuptheworkthathasbeendone,wecreatedamodelforlithuaniansummarygeneration from a text. In addition, we started to implement the method programatically but came to an issue that syntactic analysis isn’t fully done for lithuanian language and because of that we were not able to fully implement and test our model. However, we presented the means how model could be tested and evaluated in the future. In the meantime, the author of this work suggests to think carefully before choosing abstractive summary methods over extractive - the first one might be more effective but the latter is the one that can give quick results with the least efforts.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language Lithuanian
Publication date 2018