Javascript must be enabled for the correct page display

Classifying Web Pages Using a Support Vector Machine

Schaftenaar, H (2012) Classifying Web Pages Using a Support Vector Machine. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
AI-BA-2012-H.SCHAFTENAAR.pdf - Published Version

Download (221kB) | Preview
[img] Text
AkkoordWieringSchaftenaar.pdf - Other
Restricted to Repository staff only

Download (32kB)

Abstract

The World Wide Web keeps expanding at an enormous rate, tens of thousands of new pages are added daily as it becomes easier for everybody to share information. However, information is only useful when it can be retrieved. Search engines are used for this retrieval, but they are losing precision because of the ongoing expansion of the web. In order for search engines to regain precision they can be enhanced with category filters. However the web is too big to be classified by hand. In this article the possibility is explored to classify web pages in basic categories using the machine learning tool Support Vector Machines. Machine learning tools require a pre-classified training set in order to function, and manually classifying web pages for the training set can take up a lot of time. To tackle this problem, already classified home pages from a web directory are used as training set. Even though the training set is not verified the classifier scores very well. This is evidence that enhancing search engines with a category filter is not a difficult task. Because users of search engines unintentionally give feedback about the accuracy of the category filter, they help to improve it by simply using the search engine. This way more and more of the web can be categorized, and in greater detail so that information becomes more easily retrievable.

Item Type: Thesis (Bachelor's Thesis)
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 15 Feb 2018 07:50
Last Modified: 15 Feb 2018 07:50
URI: http://fse.studenttheses.ub.rug.nl/id/eprint/10392

Actions (login required)

View Item View Item