Chen, Z (2012) Supervised Feature Selection Based on Generalized Matrix Learning Vector Quantization. Master's Thesis / Essay, Artificial Intelligence.
|
Text
MasterThesis_ZetaoChen.pdf - Published Version Download (1MB) | Preview |
|
Text
ChenZAkkoord.pdf - Other Restricted to Registered users only Download (13kB) |
Abstract
Data mining involves the use of data analysis tools to discover and extract information from a data set and transform it into an understandable expression. One of its central problems is to identify a representative subset of features from which a learning model can be constructed. Feature selection is an important pre-processing step before data mining which aims to select a representative subset of features with high predictive information and eliminate irrelevant features with little importance for classification. By reducing the dimensionality of the data, feature selection helps to decrease the time for training and by selecting the most relevant features and removing the irrelevant and noisy data, the classification performance may be improved. Besides, with a smaller feature subset, the learned model may be more intuitive and easier to interpret. This thesis investigates the extension of Generalized Matrix LVQ (GMLVQ) model on feature selection. Generalized Matrix LVQ employs a full matrix as the distance metric in training. The diagonal and off-diagonal elements of the distance matrix respectively measure the contribution of each feature and feature pair for classification; therefore, their distribution can provide a quantitative measurement of feature weight. More steps and analysis are performed to force a more effective feature selection result and remove the weighting ambiguity. Besides, compared to other methods which perform feature ranking first and learning a model after selecting the feature subset, GMLVQ based methods can combine the process of feature ranking and classification together which helps to decrease the computation time. Experiments in this thesis were performed on data sets collected from the UCI Machine Learning Repository. The GMLVQ based feature weight algorithm is compared with other state-of-the-art methods: Information Gain, Fisher and Relieff. All these four feature ranking methods are evaluated using both GMLVQ and RBF based Support Vector Machine (RBF-SVM) methods by increasing the size of the selected feature subset with a stepsize rate. The results indicate that the performance of GMLVQ based feature selection method is comparable to other methods and on some of the data sets, it consistently outperforms the other methods.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 07:51 |
Last Modified: | 15 Feb 2018 07:51 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/10614 |
Actions (login required)
View Item |