Abstract [eng] |
Most of the users of foreign trade statistics have high requirements concerning data quality and speediness of deliver. Unfortunately these two requirements often conflict and it is impossible to check every observation. That‘s why there are number of quality controls of the data implemented concerning credibility and validity checks in order to focus on the most outlying observation and revice them. The objective of this work was to test the outliers’ detection metodology which was developed by Eurostat, to analize rezults and perhaps to turn it a new control in the Foreign Trade Statistics Division of Statistics Lithuania. The initial idea of Eurostat methodology to detect ouliers was to calculate a regression line by using the ordinary least squares (OLS) method to calculate by inference a prediction interval and to flag all observations that lie outside this interval. As the OLS method requires respecting some assumtions if these assumptions were violated, the OLS method was violated it was proposed to switch to an alternative method which was applied to those time series for which the OLS method was not possible. Before calculating the upper and lower limits for detecting outliers it was proposed to use a weighted least squares regression as well in order to get rid of the possible outlying observations that could appear in the data. The step with recalculating the weights and the regression model had to be repeated three times. Afterwards it was proposed to use an autoregressive error model in order to avoid the problem of autocorelation as the data are time series. After implementing the methodology described above and analysing the results the following conclusions were made: • The detection of outliers is more effective when the weights are not used; • Correction of the the regression line concerning autocorrelation does not impact the results and is meanless; • The most of the outliers are found in the time series that violate the OLS method assumptions and unrespecting OLS assumptions still does not impact efficiency of results. So regression methodology can be used even for those time series that violate the assumptions; • Regression methodology performs more effectively than the alternative one. • The majority of outliers are found in the time series with big variation. However not all of the detected „outliers” are real ones. It happens because the assumption of the homogenity of CN8 code is only theoretical and it does not stand in the practice. Concerning the conclusions described above we propose to change the structure of procedure of detection of outliers. |