Abstract [eng] |
National Statistical Institutes (NSIs) face increasing pressure to streamline their data editing processes, as detecting and correcting erroneous entries requires substantial time and resources. This thesis investigates the application of a Random Forest based framework to automate two critical tasks in the data editing workflow: identifying whether the reported value is erroneous, and then imputation of these values. The classification task demonstrated significant potential for accurately detecting erroneous records, enabling NSIs to focus their human and financial resources on the most critical cases. However, the imputation step faced challenges when predicting small or near-zero errors, particularly in cases that were wrongly classified as erroneous. Although several alternative modeling strategies were tested, none fully resolved these issues, aligning with findings from previous research. |