Title |
Skiemenų statistikos taikymas atskiriant poeziją nuo prozos / |
Translation of Title |
Discriminating poetry and prose using syllable statistics. |
Authors |
Murauskas, Gediminas ; Radavičius, Marijus |
DOI |
10.15388/LJS.2022.31988 |
Full Text |
|
Is Part of |
Lietuvos statistikos darbai.. Vilnius : Vilniaus universiteto leidykla. 2022, t. 61, p. 32-45.. ISSN 1392-642X. eISSN 2029-7262 |
Keywords [eng] |
logistic regression ; automatic syllabification ; cross-validation ; training ; classification error |
Abstract [eng] |
The aim of the paper is to construct a universal classifier to discriminate short Lithuanian text excerpts of poetry from that of prose. Here the universality means that the classifier is relatively insensitive to a text content and author's style. Since syllables represent phonetic properties and are less sensitive to text content as compared to words, the classifier training is based on frequencies of syllables in texts to be classified. The text data is taken from digitized library http://ebiblioteka.mkp.emokykla.lt. The error rate of the trained classifier applied to testing excerpts of 100 words is less than 5\%. |
Published |
Vilnius : Vilniaus universiteto leidykla |
Type |
Journal article |
Language |
Lithuanian |
Publication date |
2022 |
CC license |
|