Javascript must be enabled for the correct page display

The Web-Graph: Clustering, Collecting & Classifying

Jong, Ivo, P. de (2021) The Web-Graph: Clustering, Collecting & Classifying. Master's Thesis / Essay, Artificial Intelligence.

[img]
Preview
Text
MSc_Thesis_Ivo_deJong.pdf

Download (2MB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (119kB)

Abstract

The presented research explores methods and applications Community Detection (i.e. clustering) on the Web-Graph. A sparse segment in the clustering research is Genetic Algorithm based Modularity maximization. The first part of this research explores variations of the state of the art in this domain and shows ways to speed up learning. Statistical community detection methods are subsequently used for a novel approach to improve an existing website Trust Score model. By predicting the Trust Score of a site using nearby sites (either from BFS or clusters), two models with some error distributions can create a joint probability distribution of where the true Trust Score should lie. No benefit to the clusters beyond the Web-Graph was found, but a novel method for improving an existing regression model without a ground-truth dataset is proposed. Lastly, this research investigates the use of community detection for Fake News site classification. By taking a BFS graph sample from a training set, candidate websites are collected and clustered. A classifier using cluster indices as features outperforms one based on extracted keyphrases on testing data, demonstrating the effectiveness of Web-Graph community detection. Unfortunately, neither classifier generalizes beyond the constructed dataset, indicating a problematic bias in that dataset.

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Wiering, M.A.
Degree programme: Artificial Intelligence
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 21 May 2021 09:02
Last Modified: 21 May 2021 09:02
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/24434

Actions (login required)

View Item View Item