Javascript must be enabled for the correct page display

Determining k in k-means clustering by exploiting attribute distributions

Bocking, Oscar (2018) Determining k in k-means clustering by exploiting attribute distributions. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
AI_BA_2018_BockingO.pdf

Download (566kB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (94kB)

Abstract

Methods for estimating the natural number of clusters (k) in a data set traditionally rely on the distance between points. In this project, an alternative was investigated: exploiting the distribution of informative nominal attributes over the clusters with a chi-squared test of independence, to see which value of k partitions the data in a way that is least likely to be random. Artificial data sets are used to assess the strategy's performance and viability in comparison to a well-established distance-based method. Results indicate that the proposed strategy has a tendency to overestimate k, and only performs consistently with some types of attribute. Despite this, it has value as a heuristic method when attributes are available due to non-reliance on distance information.

Item Type: Thesis (Bachelor's Thesis)
Supervisor:
Supervisor nameSupervisor E mail
Schomaker, L.R.B.L.R.B.Schomaker@rug.nl
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 13 Jul 2018
Last Modified: 20 Jul 2018 11:36
URI: http://fse.studenttheses.ub.rug.nl/id/eprint/17848

Actions (login required)

View Item View Item