Abstract [eng] |
Forecasting partial recovery rates for non-performing loans. Based on real debt portfolio data provided by debt collection company, partial recovery rates post portfolio acquisition were modelled using one-stage, two-stage models and models ensemble. One-stage algorithms included support vector machines, random forests, k-nearest neighbors, artificial neural networks and extreme gradient boosting. Two-stage models classified recovery rates into zeros and others as stage one and then applied regression to observations higher than zero. Classification methods used were binary logistic regression and random forests. For the second stage, same regression algorithms were used as mentioned before. Moreover, the beta one and zero inflated regression was also implemented, modelling one and zero with Bernoulli random variable and other values with beta regression. The best model evaluated on test data was one-stage random forests. Ensemble stacking was built out of three best performing models: random forests, extreme gradient boosting and artificial neural networks. However, ensemble outperformed one-stage random forests marginally. The most significant variables were found to be pre-acquisition collections and debt amount at loan default. Additionally, monthly forecast machine was implemented using a loop of random forests. |