Title Tekstų nehomogeniškumo tyrimas naudojant žymeklius /
Translation of Title Analysis of text non-homogeneity using markers.
Authors Lapėnaitė-Gedvilė, Monika ; Piaseckienė, Karolina ; Radavičius, Marijus
DOI 10.15388/LJS.2015.13884
Full Text Download
Is Part of Lithuanian journal of statistics = Lietuvos statistikos darbai.. Vilnius : Lietuvos statistikos departamentas. 2015, t. 54, Nr. 1, p. 92-100.. ISSN 1392-642X. eISSN 2029-7262
Keywords [eng] statistical linguistics ; over-dispersion ; deviance ; binomial logistic regression ; functional words
Abstract [eng] The aim of the paper is to assess the distributional non-homogeneity of texts in the usage of functional words and other linguistic units. Our empirical study is based on recommended school fiction works taken from a digital library at http://ebiblioteka.mkp.emokykla.lt. Sets of frequent word forms, called markers, are made, and their frequency counts in blocks of 50 successive sentences are calculated. The frequency counts of the markers show significant excess variability (overdispersion) with respect to a text homogeneity model usually assumed in linguistics. For chosen markers, different kinds of hierarchical binomial logistic regression models with the author's identifier, the block length and the frequency counts of the remaining markers as explanatory variables are fitted to the block data in order to explain the observed overdispersion of the markers chosen.
Published Vilnius : Lietuvos statistikos departamentas
Type Journal article
Language Lithuanian
Publication date 2015
CC license CC license description