Improving Domain Robustness on Translating Out Domain Corpus

Uzodinma, Ethelbert (2022) Improving Domain Robustness on Translating Out Domain Corpus. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
Master_Thesis_Writeup.pdf
Download (913kB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (121kB)

Abstract

Deep learning has become the latest approach to solving natural language processing tasks such as machine translation, because of its improved performance over translations previously made using statistical techniques. The evidence of this claim can be seen in translations involving a bilingual or multilingual parallel corpus where the source and target text are from within the same genre, known as “in-domain translation”. Despite these achievements, one of the six major challenges of neural machine translations still remains that translations using neural networks still produce poor performance when translating text from a genre different from the genre of the training set. This is referred to as “out-of-domain translation”. This thesis investigates different methods that be used to improve out-of-domain translations such as byte-pair encoding, sub-word regularization, beam size, label smoothing and domain-adaptation were applied to the training and fine-tuning stages in multiple out-of-domain translation experiments. The results showed improvement in fluency and adequacy. Evaluations using BLEU scores and perplexity showed an overall improvement of above 10% in out-of-domain translations.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Spenader, J.K. and Doornkamp, J.
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	01 Sep 2022 10:28
Last Modified:	01 Sep 2022 10:28
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/28621

Actions (login required)

View Item