Javascript must be enabled for the correct page display

Using Support Vector Machines to solve large-scale and complex real world text categorization problems

Schie, G. van (2004) Using Support Vector Machines to solve large-scale and complex real world text categorization problems. Master's Thesis / Essay, Computing Science.

[img]
Preview
Text
Infor_Ma_2004_GvanSchie.CV.pdf - Published Version

Download (2MB) | Preview

Abstract

With the massive growth of the use of computers and the internet in the past decade, there has been an explosion on the volume of electronic documents and mail. Due to this, people are becoming unable to make use of all this information, and in order to keep it comprehensible to people, it is necessary to order these documents into (hierarchical) categories. Classifying natural language text documents into a fixed number of predefined categories, is called text categorization or text classification (TC) and is used for tasks like: email ordering / spam filtering, topic identification, document orgamzation and Web searching (e.g. Yahoo!). However, this increasing amount of available information, also increased the size and complexity of these tasks. Doing these tasks manually, became therefore, very time-consuming and costly. This resulted in an increasing demand for automatic text classifiers. In the past decade, a machine learning algorithm called Support Vector Machines (SVM), gained a lot of popularity for constructing automatic text classifiers. In this thesis I will do an elaborative study on automatic text classification in general, and on these Support Vector Machines (SVM) in particular. Based on this study I will set up a research, that will deal with the choices that can be made in the design process of an automatic text classifier, and conduct this research on a large-scale and complex real world problem, to see what choices can best be made for such problems. The problem that will be used in this research, is the text classification task of 2ehands.nl. Based on the results of this research, a guide will be created with which such problems can be solved.

Item Type: Thesis (Master's Thesis / Essay)
Degree programme: Computing Science
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 15 Feb 2018 07:30
Last Modified: 15 Feb 2018 07:30
URI: http://fse.studenttheses.ub.rug.nl/id/eprint/8892

Actions (login required)

View Item View Item