Abstract [eng] |
Currently majority of research in automatic text summarisation focuses on creating new or improving existing text summarisation models. However, this trend is rather dangerous as they do not ask a question: if the algorithm they are trying to improve suits the data in the first place? This master’s thesis tries to fill in this gap in the literature by assessing if different types of text summarisation algorithms perform equally well with academic papers from different scientific disciplines. More specifically, it applies Pivoted QR Decomposition, Naïve Bayes, Decision Tree, Hidden Markov Model, and Support Vector Machine (SVM) text summarisation algorithms on academic papers from the fields of medicine, biology, computer science, and economics (including finance), and compares the results. According to the research, the academic field does not have an influence on the accuracy of summarisation. By applying different algorithms on different texts, independently of the text type the same algorithms (Naïve Bayes and SVM) performed the best. However, at the same time, when applying different algorithms on aggregated data, in general the accuracy achieved was smaller than when applying it on separate texts. Hence, it can be concluded that though the type of text does not influence the summarisation that much, it is better to use separate data than an aggregate in order to achieve higher summarisation accuracy. |