Title Sukčiavimo atvejų nustatymo automatizavimas finansinių paslaugų įmonėse /
Translation of Title Automating fraud detection in financial services companies.
Authors Pranskūnaitė, Rugilė
Full Text Download
Pages 86
Abstract [eng] This research paper is carried out with the aim of proposing and implementing an automated fraud detection model that can be used in financial service companies. The goal of this work is achieved through five main tasks. First, the types and categories of fraud cases, the principles of their detection, the criteria for evaluating the models’ performance and related works by other authors are summarized. The workflow and methods based on which a fully automated fraud detection model is developed are also presented. The data used in the study were processed using the data preparation software package “Tableau Prep”. Model development is achieved using the supervised machine learning tool H2O AutoML. This research also investigates the impact of a new variable, obtained by the unsupervised machine learning algorithm “Isolation Forest”, on the model metrics and the percentage distribution of weights of independent variables. Finally, the best automated fraud detection model is proposed. The results obtained in this empirical research work show that models often tend to overfit if they contain many variables. The most realistic model with the lowest overfitting rate (before the additional independent variable was added) was obtained when the model included six independent variables, using the GBM algorithm with 140 trees, a maximum depth of 10, and a sample rate of 0,8, but one variable was strongly dominant in terms of percentage weight. Adding the additional variable to the model normalized the distribution of the weights of the independent variables, but no positive impact on the performance of the models was observed. Adding the additional variable to the analysis showed that the models overfitted more often due to the better distribution of weights, so it is assumed that the dominance of one variable prevents the model from overfitting. As the best and most realistic model, with a better distribution of variable weights, was proposed another model, which also obtained with the GBM algorithm, but it includes seven independent variables including additional “Isolation Forest score” feature, the number of trees is 180, the maximum depth is 8, and the sample rate is 0,8. The model predicted 96 472 normal predictions and 1 369 fraud cases, with an error score of 25 %. This research paper consists of 67 pages, 28 tables, and 13 figures.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language Lithuanian
Publication date 2025