Kim, Sohyung (2021) Using Confidential Data for Domain Adaptation of Neural Machine Translation. Master's Thesis / Essay, Artificial Intelligence.
|
Text
Master_thesis__Sohyung_Kim_s3475743.pdf Download (4MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (124kB) |
Abstract
Domain adaptation has led to remarkable achievements in Neural Machine Translation (NMT). Therefore, the availability of in-domain data remains essential to ensure the quality of NMT, especially in technical domains. However, obtaining such data is often challenging, and in many real-world scenarios this is further aggravated by data confidentiality or copyright concerns. We study the problem of domain adaptation in NMT when domain-specific data cannot be shared due to confidentiality issues. We propose to fragment data into phrase pairs and use a shuffled and random sample to fine-tune a generic NMT model instead of using the full sentences. Despite the loss of long segments, we find that NMT quality can considerably benefit from this adaptation and that further gains can be obtained with a simple tagging technique.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Bisazza, A. and Spenader, J.K. and Turkmen, F. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 23 Aug 2021 09:59 |
Last Modified: | 23 Aug 2021 09:59 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/25668 |
Actions (login required)
View Item |