Javascript must be enabled for the correct page display

Classification System for Mortgage Arrear Management

Zhe, (2013) Classification System for Mortgage Arrear Management. Master's Thesis / Essay, Computing Science.

[img]
Preview
Text
thesis_ZheSun_CS_master.pdf - Published Version

Download (2MB) | Preview
[img] Text
SunAkkoordPetkov.pdf - Other
Restricted to Repository staff only

Download (44kB)

Abstract

The ING Domestic Bank possesses around 22% market share of Dutch mortgages. Normally, mortgage customers have to pay the interest or deposit monthly. But somehow, a considerable number of customers repay late, or default for one or even several months, which brings tremendous losses to ING. The Arrears department manages the arrears of mortgage payments, and it contacts defaulters by letters, SMS, Emails or phone calls. Comparing with the existing working process, the Arrears department aims at to make the treatments more intensive in order to push defaulters to repay as soon as possible, while keeping the current operational cost. We develop a classification model to predict the behaviour of the mortgage customers who were healthy in the last month but do not pay the debt at the beginning of the current month. One label with two possible values is assigned by our model: the delayers, who just pay late but not exceeding 1 month, and defaulters, who do not pay even at the end of the month. In this way, the Arrears department can only treat defaulters intensively, who really have payment problems. In this project, 400,000 customers with more than 2,000 features are collected from the ING data warehouse. Feature selection and data preprocessing are executed first. Then, we train several popular basic classifiers such as KNN, Naive Bayes, decision trees, logistic regression, and also some ensemble methods like bagging, random forests, boosting, voting and stacking. Since the two classes are highly imbalanced (the ratio of defaulters to delayers is around 1:9), we discuss the evaluation metrics of skewed data learning. The Area under the ROC curve is employed to compare the results of different classifiers. Besides, the impacts of sampling techniques are empirically studied as well. Our experiments show that ensemble methods increase the performance of basic classifiers remarkably. We also conclude that symmetric sampling improves the classification performance. Balanced random forests is chosen to build the model for the Arrears department, which gives an AUC value of around 0.772. The model has already been deployed into the daily work of the Arrears department of the ING domestic bank since June 2013. Finally, cost matrix analysis and feature importance ranking are studied in order to guide the daily work of the Arrears department and give a deep insight to this problem. Conservatively estimating, the risk cost of ?(we hide the number due to confidential issue) euro can be saved per month by using the model and the new working process.

Item Type: Thesis (Master's Thesis / Essay)
Degree programme: Computing Science
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 15 Feb 2018 07:56
Last Modified: 15 Feb 2018 07:56
URI: http://fse.studenttheses.ub.rug.nl/id/eprint/11525

Actions (login required)

View Item View Item