Machine learning methods for automated data editing of the turnover of service enterprises /

Title	Machine learning methods for automated data editing of the turnover of service enterprises /
Translation of Title	Mašininio mokymosi metodai automatizuotam paslaugų įmonių apyvartos redagavimui.
Authors	Švambarytė, Klaudija
Full Text
Pages	63
Keywords [eng]	Data Editing ; Random Forest ; Classification ; Imputation ; National Statistical Institutes, Error Detection, Machine Learning
Abstract [eng]	National Statistical Institutes (NSIs) face increasing pressure to streamline their data editing processes, as detecting and correcting erroneous entries requires substantial time and resources. This thesis investigates the application of a Random Forest based framework to automate two critical tasks in the data editing workflow: identifying whether the reported value is erroneous, and then imputation of these values. The classification task demonstrated significant potential for accurately detecting erroneous records, enabling NSIs to focus their human and financial resources on the most critical cases. However, the imputation step faced challenges when predicting small or near-zero errors, particularly in cases that were wrongly classified as erroneous. Although several alternative modeling strategies were tested, none fully resolved these issues, aligning with findings from previous research.
Dissertation Institution	Vilniaus universitetas.
Type	Master thesis
Language	English
Publication date	2025

„Machine learning methods for automated data editing of the turnover of service enterprises /“