Title Classification of lithuanian text into translated and original based on word frequency distribution /
Translation of Title Lietuviškų tekstų klasifikavimas į verstinius ir originalius pagal jų žodžių dažnių skirstinius.
Authors Šimaitis, Gediminas
Full Text Download
Pages 27
Keywords [eng] Versto teksto klasifikavimas, vektorių palaikymo mašinos, Zipf dėsnis, žodžių dažnių skirstinys, Translationese classification, support vector machines, Zipf's law, word frequency distribution
Abstract [eng] Translated text has certain features which mark it as such, which can be identified using statistical methods. Features such as lexical density, vocabulary richness and word length distribution are some of the marks of translated text identified by existing research. In this work support vector machine models, which were found to be effective for this purpose by previous studies, are applied to corpora of Lithuanian monolingual texts. The models are then augmented using variables constructed to reflect the suggested marks of translated text in an attempt to improve classification performance.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language English
Publication date 2017