Lietuviškų tekstų klasifikavimas naudojant skiemenis

Title	Lietuviškų tekstų klasifikavimas naudojant skiemenis
Translation of Title	Lithuanian texts classification using syllables.
Authors	Žeruolienė, Agnė
Full Text
Pages	38
Abstract [eng]	Rapid key information extraction from text documents is a pressing problem, as it is texts that make up the bulk of the unstructured data generated. The complexity of the Lithuanian language and having different word forms complicates the task even more. This fact encourages the search for text-characterizing elements that are simpler in structure than the word. The application of various methods using Lithuanian syllables has not been studied before. In this work, the syllables properties of Lithuanian (or translated into Lithuanian) fiction texts are explored. The possibilities to use the syllables characteristics for texts classification are investigated. A new algorithm for classifying text fragments by genre is developed using two-stage logistic regression. Initially, syllable odds are modeled using binomial logistic regression. In the second stage, the characteristics of the odds are modeled and other syllable features are used for classification. The developed algorithm is compared with other classification algorithms.
Dissertation Institution	Vilniaus universitetas.
Type	Master thesis
Language	Lithuanian
Publication date	2021

„Lietuviškų tekstų klasifikavimas naudojant skiemenis“