Topic Modeling Evaluations: The Relationship Between Coherency and Accuracy

Hadiat, Alfiuddin R. (2022) Topic Modeling Evaluations: The Relationship Between Coherency and Accuracy. Master's Thesis / Essay, Computational Cognitive Science.

Preview

Text
s2863685_alfiuddin_hadiat_CCS_thesis.pdf
Download (679kB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (122kB)

Abstract

Topic models are generally evaluated using coherency measures. These measures calculate the frequency of co-occurrence between all the representative words of a topic. Research shows that coherence correlates well with human judgment. However, no research has looked into the correlation between coherence and classifier accuracy. Can we use coherence for topic model selection when it is used for a prediction problem? To fill this gap, this project conducts two experiments that investigate this correlation. Two topic models (LDA and BERTopic) are trained and evaluated with four different coherence measures (UCI, UMass, NPMI, and CV). Classifiers (Logistic Regression and Decision Trees) are trained using topic model features to predict corpus categories. Accuracies are then correlated with the coherence measures. The results found the classifiers significantly correlated with UMass and NPMI. However, the UMass correlations were problematic for being inconsistent. Therefore, only NPMI could be considered generalizable for classifier performance estimation. The results also showed a difference between classifier performance using different topic models. That is, though BERTopic had higher coherence scores, LDA led to better logistic regression classifiers.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Doornkamp, J. and Spenader, J.K.
Degree programme:	Computational Cognitive Science
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	01 Sep 2022 10:38
Last Modified:	01 Sep 2022 10:38
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/28618

Actions (login required)

View Item