Abstract [eng] |
The main purpose of this master thesis is to improve the clasical chi square goodness of fit test so that it would depend less on the rule, according to which the data is divided into intervals, and on the hypothetical probabilities. The tasks of the paper are the following: • To examine (according to the literature ([1], [2], [3])) theoretical and applied aspects of constructing chi square test; • To analyse the methods of graphic representation of the empirical characteristics of the data and graphic comparison with their theoretical equivalents. Altogether to look through the ways of dividing the data into intervals (determining their number and length); • To calculate the mean, variance and third central moment of chi square test statistics and to analyse their properties; • To create a macro for counting the previously mentioned three moments using SAS software; • To create a macro for modeling statistics and to construct goodness of fit test according to the modeled values using SAS software; • To construct an aproximate modified chi square goodness of fit test based on a new statistics; • To perform comparison of the clasical and modified goodness of fit tests, having the modeled values; Analysis of graphing the empirical data and comparison with the theoretical equivalents has showed that the distribution (density) is recognised easier when the probabilities of values to be prescribed to certain intervals are equal. This way the execution of computations is easier as well as the mean, variance and the third central moment of statistics are closer to their theoretical equivalents. The analysis of mean, variance and the third central moment of statistics has been performed according to the changes of the sample size. It has showed that when a sample is small, there is a need for correction of statistics since the empirical values of the moments are rather far from their theoretical equivalents. Referring to the delineated before, a modification of statistics has been imposed: . In addition, one more test has been proposed. It is based only on the modeled values of the statistics and the p-value. Afterwords, the comparison of the three tests (clasical chi square, modified chi square goodness of fit tests and the test based on the modeled values) has been executed. The method used was comparing the probabilities according to each of the tests. The results were there following: • The test based on the modeled values is rather precise and it can be sufficient for testing hypothesis if there are enough modeled values; • The correction of the statistics makes the goodness of fit test slightely more accurate in both cases when the tested hypothesis is correct and incorrect; • While the sample size is increasing the accuracies of the tests become almost equal and afterwards the question which of the tests is better to be applied is not actual anymore; • The more hypothetical probabilities are included in the test the more precise results are gained as well as in the case when the hypothetical probabilities are equal. |