Title Machine learning methods for automated data editing of the turnover of service enterprises /
Translation of Title Mašininio mokymosi metodai automatizuotam paslaugų įmonių apyvartos redagavimui.
Authors Švambarytė, Klaudija
Full Text Download
Pages 63
Keywords [eng] Data Editing ; Random Forest ; Classification ; Imputation ; National Statistical Institutes, Error Detection, Machine Learning
Abstract [eng] National Statistical Institutes (NSIs) face increasing pressure to streamline their data editing processes, as detecting and correcting erroneous entries requires substantial time and resources. This thesis investigates the application of a Random Forest based framework to automate two critical tasks in the data editing workflow: identifying whether the reported value is erroneous, and then imputation of these values. The classification task demonstrated significant potential for accurately detecting erroneous records, enabling NSIs to focus their human and financial resources on the most critical cases. However, the imputation step faced challenges when predicting small or near-zero errors, particularly in cases that were wrongly classified as erroneous. Although several alternative modeling strategies were tested, none fully resolved these issues, aligning with findings from previous research.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language English
Publication date 2025