Hafkenscheid, J.M. (2010) FOCUSED SURFER MODELS : Ranking visual search results. Master's Thesis / Essay, Computing Science.
|
Text
thesis.pdf - Published Version Download (3MB) | Preview |
Abstract
PageRank is a graph-based ranking algorithm that ranks the nodes in a graph based on their connections. It has been designed to determine the intrinsic value of a page, based on the link structure of the web. We research the effects of adaptations to the PageRank algorithm in the context of a gallery search engine. The search engine uses text-based search to select galleries. Galleries are web pages that contain a number of small images, which link to enlarged versions of them. We evaluate the performance by looking at the ordering of the search results. The generic PageRank algorithm does not work well for this kind of application; it seems that the best galleries are not found on the pages with the highest PageRank. PageRank assumes that all links convey trust to other pages, but many links disrupt that concept (e.g. links to download or update your browser or required plug-ins). The inherent nature of the algorithm is the Random Surfer Model, which is based on a virtual user that navigates the web by following a random link on a page, and repeating this infinitely. The PageRank algorithm can be changed by altering the probability with which the random surfer chooses which link to follow. Our design choice is to replace the random surfer model with a focused surfer model, which increases the probability of following a link based on the similarity between the linking page and the target. We will analyze the performance with the gallery search engine and compare the results with a generic PageRank and uniform ranking. The search engine uses a dataset that consists of 15 million galleries; to acquire this dataset we have crawled 1.5 billion ages. We find that alterations to the system affect the outcome, and that the overall performance is increased. The PageRank algorithm is intended for information retrieval, it works very well for general purpose ranking, but the random surfer model does not work well enough for all applications.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Computing Science |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 07:44 |
Last Modified: | 15 Feb 2018 07:44 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/9346 |
Actions (login required)
View Item |