Title Skiemenų statistikos taikymas atskiriant poeziją nuo prozos /
Translation of Title Discriminating poetry and prose using syllable statistics.
Authors Murauskas, Gediminas ; Radavičius, Marijus
DOI 10.15388/LJS.2022.31988
Full Text Download
Is Part of Lietuvos statistikos darbai.. Vilnius : Vilniaus universiteto leidykla. 2022, t. 61, p. 32-45.. ISSN 1392-642X. eISSN 2029-7262
Keywords [eng] logistic regression ; automatic syllabification ; cross-validation ; training ; classification error
Abstract [eng] The aim of the paper is to construct a universal classifier to discriminate short Lithuanian text excerpts of poetry from that of prose. Here the universality means that the classifier is relatively insensitive to a text content and author's style. Since syllables represent phonetic properties and are less sensitive to text content as compared to words, the classifier training is based on frequencies of syllables in texts to be classified. The text data is taken from digitized library http://ebiblioteka.mkp.emokykla.lt. The error rate of the trained classifier applied to testing excerpts of 100 words is less than 5\%.
Published Vilnius : Vilniaus universiteto leidykla
Type Journal article
Language Lithuanian
Publication date 2022
CC license CC license description