Multi-Document Keyphrase Extraction

Skala, Daniel (2022) Multi-Document Keyphrase Extraction. Bachelor's Thesis, Computing Science.

Preview

Text
bCS_2022_SkalaD.pdf
Download (8MB) | Preview

Text
Toestemming.pdf
Restricted to Registered users only
Download (127kB)

Abstract

Multi-Document Keyphrase Extraction (MDKE) is one of the fundamental problems withing Natural Language Processing (NLP). It is widely used in practice for tasks such as text summarisation, topic generation and clustering. One of the recent advances in MDKE is the creation of the MK-DUC-01 dataset. Due to its novelty and lack of research on MDKE, we want to investigate the reproducibility of the performance of various KE algorithms. In addition, we propose two novel methods for keyphrase extraction built on top of TopicRank. The first algorithm ’SlidoRank’ is asymptotically faster and more scalable due to the replacement of the slow topic graph generation and the PageRank algorithm used in TopicRank. The algorithm also outperforms TopicRank in terms of F 1@k scores tested on the MK-DUC-01 dataset. The second proposed algorithm ’Embeddings’ extends SlidoRank by semantic similarity of keyphrases. For a special configuration of hyperparameters, the Embeddings algorithm yields even better F 1@k scores than SlidoRank.

Item Type:	Thesis (Bachelor's Thesis)
Supervisor name:	Azzopardi, G. and Mohsen, F.F.M.
Degree programme:	Computing Science
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	19 Jul 2022 14:35
Last Modified:	19 Jul 2022 14:35
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/28033

Actions (login required)

View Item