Hoekstra, M (2015) Identifying relationships between websites, relating websites to people. Master's Thesis / Essay, Computing Science.
|
Text
thesis.pdf - Published Version Download (2MB) | Preview |
|
Text
toestemming.pdf - Other Restricted to Backend only Download (514kB) |
Abstract
Finding ownership relations between websites in the almost 1 billion websites available on the Internet is like searching for a needle in a haystack. Although website owners have the possibility of estab- lishing connections with their other websites, it might not always be desired in order to remain anonymous. Even if these connections are established this is not done following a uniform protocol, making the identification of such relations an intricate task. Nevertheless, identifying these relations yields a broad range of opportunities. One can utilize this information in law enforcement, commerce or use it for optimizing search engines. In this work I present my investigation on whether it is possible to identify relations between websites using their characteristics. These characteristics consist of identifiers, which are unique across websites, as well as detectable website technologies (e. g. servers, frameworks) serving as features. Logistic regression is used with the features as in- put and identifiers at the base of the ground truth. Relations between websites are identified with an accuracy of up to 90%, measured with an F1 score (i. e. harmonic mean of precision and recall).
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Computing Science |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 08:09 |
Last Modified: | 15 Feb 2018 08:09 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/13412 |
Actions (login required)
View Item |