Zhe (2013) Classification System for Mortgage Arrear Management. Master's Thesis / Essay, Computing Science.
|
Text
thesis_ZheSun_CS_master.pdf - Published Version Download (2MB) | Preview |
|
Text
SunAkkoordPetkov.pdf - Other Restricted to Registered users only Download (44kB) |
Abstract
The ING Domestic Bank possesses around 22% market share of Dutch mortgages. Normally, mortgage customers have to pay the interest or deposit monthly. But somehow, a considerable number of customers repay late, or default for one or even several months, which brings tremendous losses to ING. The Arrears department manages the arrears of mortgage payments, and it contacts defaulters by letters, SMS, Emails or phone calls. Comparing with the existing working process, the Arrears department aims at to make the treatments more intensive in order to push defaulters to repay as soon as possible, while keeping the current operational cost. We develop a classification model to predict the behaviour of the mortgage customers who were healthy in the last month but do not pay the debt at the beginning of the current month. One label with two possible values is assigned by our model: the delayers, who just pay late but not exceeding 1 month, and defaulters, who do not pay even at the end of the month. In this way, the Arrears department can only treat defaulters intensively, who really have payment problems. In this project, 400,000 customers with more than 2,000 features are collected from the ING data warehouse. Feature selection and data preprocessing are executed first. Then, we train several popular basic classifiers such as KNN, Naive Bayes, decision trees, logistic regression, and also some ensemble methods like bagging, random forests, boosting, voting and stacking. Since the two classes are highly imbalanced (the ratio of defaulters to delayers is around 1:9), we discuss the evaluation metrics of skewed data learning. The Area under the ROC curve is employed to compare the results of different classifiers. Besides, the impacts of sampling techniques are empirically studied as well. Our experiments show that ensemble methods increase the performance of basic classifiers remarkably. We also conclude that symmetric sampling improves the classification performance. Balanced random forests is chosen to build the model for the Arrears department, which gives an AUC value of around 0.772. The model has already been deployed into the daily work of the Arrears department of the ING domestic bank since June 2013. Finally, cost matrix analysis and feature importance ranking are studied in order to guide the daily work of the Arrears department and give a deep insight to this problem. Conservatively estimating, the risk cost of ?(we hide the number due to confidential issue) euro can be saved per month by using the model and the new working process.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Computing Science |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 07:56 |
Last Modified: | 15 Feb 2018 07:56 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/11525 |
Actions (login required)
View Item |