Mol, Barbera, de (2020) A Comparison of Data-Driven Morphological Segmenters for Low-Resource Polysynthetic Languages: A Case Study of Greenlandic. Bachelor's Thesis, Artificial Intelligence.
|
Text
AI_BA_2020_BarberaDeMol.pdf Download (253kB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (96kB) |
Abstract
Morphological segmentation is vital in many areas of natural language processing, including machine translation. However, very little research in this field has been performed on low-resource polysynthetic languages. Rather, most research in has focused on languages with existing resources and moderate morphological inflections. Greenlandic is such a polysynthetic language, and due to its relatively few native speakers, few resources have been developed. For this paper, the author manually crafted the largest publicly accessible annotated dataset of Greenlandic morphological segmentations. With this dataset, intrinsic experiments are conducted where seven different methods for morphological segmentations including one rule-based system and six (supervised) machine learning systems are compared through calculating precision, recall, F1-score and accuracy using tenfold cross-validation. The fully-supervised (F1-score = 0.633, accuracy = 0.542) and semi-supervised (F1-score = 0.631, accuracy = 0.553) Conditional Random Fields perform best. Extrinsically, a baseline with no segmentation and the six most promising models from the intrinsic evaluation are implemented in a neural machine translation model and their BLEU scores are compared. The results for the extrinsic evaluation were however not reliable because the neural machine translation models performed below par.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Spenader, J.K. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 31 Aug 2020 08:26 |
Last Modified: | 31 Aug 2020 08:26 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/23300 |
Actions (login required)
View Item |