Skala, Daniel (2022) Multi-Document Keyphrase Extraction. Bachelor's Thesis, Computing Science.
|
Text
bCS_2022_SkalaD.pdf Download (8MB) | Preview |
|
Text
Toestemming.pdf Restricted to Registered users only Download (127kB) |
Abstract
Multi-Document Keyphrase Extraction (MDKE) is one of the fundamental problems withing Natural Language Processing (NLP). It is widely used in practice for tasks such as text summarisation, topic generation and clustering. One of the recent advances in MDKE is the creation of the MK-DUC-01 dataset. Due to its novelty and lack of research on MDKE, we want to investigate the reproducibility of the performance of various KE algorithms. In addition, we propose two novel methods for keyphrase extraction built on top of TopicRank. The first algorithm ’SlidoRank’ is asymptotically faster and more scalable due to the replacement of the slow topic graph generation and the PageRank algorithm used in TopicRank. The algorithm also outperforms TopicRank in terms of F 1@k scores tested on the MK-DUC-01 dataset. The second proposed algorithm ’Embeddings’ extends SlidoRank by semantic similarity of keyphrases. For a special configuration of hyperparameters, the Embeddings algorithm yields even better F 1@k scores than SlidoRank.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Azzopardi, G. and Mohsen, F.F.M. |
Degree programme: | Computing Science |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 19 Jul 2022 14:35 |
Last Modified: | 19 Jul 2022 14:35 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/28033 |
Actions (login required)
View Item |