Javascript must be enabled for the correct page display

Morphological Segmentation of Polysynthetic Languages for Neural Machine Translation: The Case of Inuktitut

Roest, Christian (2020) Morphological Segmentation of Polysynthetic Languages for Neural Machine Translation: The Case of Inuktitut. Master's Thesis / Essay, Artificial Intelligence.

[img]
Preview
Text
mAI_2020_Christian_Roest.pdf

Download (1MB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (96kB)

Abstract

Segmentation of words into sub-word units is a crucial preprocessing step of modern machine translation systems, which allows the translation of unseen and rare words. It is still largely unknown how existing methods perform on a category of languages with a much higher degree of morphological complexity, called polysynthetic languages. Characteristic for polysynthetic languages are long sentence-words, that consist of many morphemes. These long words are a result inflection and agglutination, which allow words to carry a much more detailed meaning than words in most other languages can. We hypothesise that, to deal with such complex languages, translation systems require a robust segmentation method to isolate meaningful parts of a word accurately and consistently. The current state-of-the-art of language-agnostic segmenters were not designed with polysynthetic languages in mind, which begs the question whether they can provide adequate performance for these languages. In this thesis, various methods are compared on their ability to generate linguistically correct segmentations, and their ability to produce quality translations as a part of a state-of-the-art neural machine translation system for the low-resource polysynthetic language Inuktitut. With the results we determine

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Spenader, J.K. and Toral Ruiz, A.
Degree programme: Artificial Intelligence
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 11 Sep 2020 09:47
Last Modified: 11 Sep 2020 09:47
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/23334

Actions (login required)

View Item View Item