Jong, Ivo, P. de (2021) The Web-Graph: Clustering, Collecting & Classifying. Master's Thesis / Essay, Artificial Intelligence.
|
Text
MSc_Thesis_Ivo_deJong.pdf Download (2MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (119kB) |
Abstract
The presented research explores methods and applications Community Detection (i.e. clustering) on the Web-Graph. A sparse segment in the clustering research is Genetic Algorithm based Modularity maximization. The first part of this research explores variations of the state of the art in this domain and shows ways to speed up learning. Statistical community detection methods are subsequently used for a novel approach to improve an existing website Trust Score model. By predicting the Trust Score of a site using nearby sites (either from BFS or clusters), two models with some error distributions can create a joint probability distribution of where the true Trust Score should lie. No benefit to the clusters beyond the Web-Graph was found, but a novel method for improving an existing regression model without a ground-truth dataset is proposed. Lastly, this research investigates the use of community detection for Fake News site classification. By taking a BFS graph sample from a training set, candidate websites are collected and clustered. A classifier using cluster indices as features outperforms one based on extracted keyphrases on testing data, demonstrating the effectiveness of Web-Graph community detection. Unfortunately, neither classifier generalizes beyond the constructed dataset, indicating a problematic bias in that dataset.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Wiering, M.A. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 21 May 2021 09:02 |
Last Modified: | 21 May 2021 09:02 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/24434 |
Actions (login required)
View Item |