Abstract [eng] |
This master’s thesis aims to determine the identity of different prose creators. By analyzing five books by different authors. Several tasks were set to achieve the final goal. The first task was to determine which linguistic features of the works in question are suitable for statistical analysis. To accomplish this task, using software R, I prepared the texts from the books. This was completed by separating the words and punctuation marks, dividing the sentences into words and letters, counting the words and letters, determining the frequency of repeated words and letters. I also performed statistical analysis with the obtained data for repetitive words and sentences of different lengths for each author. The second task consisted of investigating which methods of mathematical statistics are suitable for achieving the overall goal. I investigate that (lingvo) statistical data analysis, correlation analysis, one-way analysis of variance, and computer statistics methods can be used to determine the identity of different prose writers. The third task was to find out how the authors differ from each other by applying the chosen methods of mathematical statistics. We managed to find out that the letters t, h, ė, and u can be used to identify the fifth author. The letter č helps identify the fourth author. It was also noted that the letters c, z, d, k, m, p, and ž are used as a percentage in the work of all authors equally and are completely unsuitable for the identification of any specific author. After testing the hypotheses of the polynomial distribution, it was found that 2 authors can be distinguished by their writing style. |