Title Analysis of statistical and machine learning techniques for predicting individual credit rating based on lithuanian data /
Translation of Title Statistinių ir mašininio mokymosi metodų, skirtų individualaus kredito reitingui prognozuoti, analizė naudojant Lietuvos duomenis.
Authors Neifaltas, Airidas
Full Text Download
Pages 64
Keywords [eng] Credit score prediction, Machine learning, Risk analysis, Resampling techniques, Binary classification, Multiclass classification.
Abstract [eng] Credit risk assessment is one of the most crucial elements in the financial sector. The study is based on one Lithuanian Loan-comparison platform’s applicants' data. Loan-comparison platforms are innovative financial intermediaries that allow consumers to compare several loan offers and choose the most self-favourable. This study presents a comprehensive analysis of statistical and machine learning techniques for predicting individual credit scores using Lithuanian data. Five algorithms, i.e., Logistic Regression, Support Vector Machine, Random Forest, XGBoost and Artificial Neural Networks, are included in our analysis. To deal with the high class imbalance problem the resampling techniques Random Oversampling and SMOTE are also applied. In addition, the analysis is done using binary, 3-Class and 5-Class classification problems. It is the first study in the credit scoring field, to the best of the author’s knowledge, that combines the results from both binary and multiclass classification problems. The performance is assessed considering different performance evaluation metrics (Accuracy, AUC, Training time, Precision (Macro Precision), Recall (Macro Recall), False Positive rate and False Negative rate). The final scoring method was proposed to combine the results of different classification experiments. The Random Forest and XGBoost were the best performing algorithms in predicting individual credit scores based on the final scoring method. Empirical results revealed that the best algorithms perform comparatively better in 3-Class classification problem than in binary classification case. Furthermore, the resampling techniques help to predict the minority (risky) class significantly better, where the XGBoost algorithm with the Random Oversampling technique applied reaches an impressive 0.804 Accuracy in predicting the Riskiest E class in the 3-Class classification problem. Finally, this study provides recommendations for applying the credit scoring system in the Lithuanian Loan-comparison platform. Implementing a credit scoring system could help reduce the possibility of lending to a risky customer and avoid additional expenses for external credit scoring agencies that may lack complete information about the customer.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language English
Publication date 2023