Roest, Christian (2020) Morphological Segmentation of Polysynthetic Languages for Neural Machine Translation: The Case of Inuktitut. Master's Thesis / Essay, Artificial Intelligence.
|
Text
mAI_2020_Christian_Roest.pdf Download (1MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (96kB) |
Abstract
Segmentation of words into sub-word units is a crucial preprocessing step of modern machine translation systems, which allows the translation of unseen and rare words. It is still largely unknown how existing methods perform on a category of languages with a much higher degree of morphological complexity, called polysynthetic languages. Characteristic for polysynthetic languages are long sentence-words, that consist of many morphemes. These long words are a result inflection and agglutination, which allow words to carry a much more detailed meaning than words in most other languages can. We hypothesise that, to deal with such complex languages, translation systems require a robust segmentation method to isolate meaningful parts of a word accurately and consistently. The current state-of-the-art of language-agnostic segmenters were not designed with polysynthetic languages in mind, which begs the question whether they can provide adequate performance for these languages. In this thesis, various methods are compared on their ability to generate linguistically correct segmentations, and their ability to produce quality translations as a part of a state-of-the-art neural machine translation system for the low-resource polysynthetic language Inuktitut. With the results we determine
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Spenader, J.K. and Toral Ruiz, A. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 11 Sep 2020 09:47 |
Last Modified: | 11 Sep 2020 09:47 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/23334 |
Actions (login required)
View Item |