Dima, Alina Elena (2024) Next-Word Prediction in Low-Resource Languages: A Study on Hrusso Aka. Master's Thesis / Essay, Artificial Intelligence.
|
Text
mAI2024DimaAE.pdf Download (1MB) | Preview |
|
Text
Toestemming.pdf Restricted to Registered users only Download (181kB) |
Abstract
With most of the languages worldwide being endangered or potentially vulnerable, it is essential to develop technological tools that facilitate the preservation of these languages. This project focuses on Hrusso Aka, an endangered isolate language spoken in Northeast India. The aim is to contribute to the revitalisation of this language by developing a next-word prediction model for mobile device keyboards. Due to the limited written resources available in Hrusso Aka, the major challenge of this study is data scarcity. To develop a model that can learn effectively with such limited data, we experimented with models of varying complexity. This involved implementing and comparing an n-gram model, a recurrent neural network (RNN) model, and a compact transformer model. The models were evaluated on two criteria, namely, the accuracy of predicting the next word and the inference time, as the model should function in real time for mobile keyboard use. Based on our assessment, the GRU-based RNN model significantly outperformed the n-gram and transformer models on both evaluation criteria. The GRU-based RNN achieved a top-3 accuracy of 19.94%, and an inference time of 0.0006s, compared to the n-gram and transformer models which were slower and achieved a top-3 accuracy of less than 10%. These findings suggest that, although transformer models are state-of-the-art solutions for many high-resource tasks, simpler models appear to be better suited for low-resource languages.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Jones, S.M. and Tashu, T. M. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 28 Nov 2024 09:01 |
Last Modified: | 28 Nov 2024 09:01 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/34443 |
Actions (login required)
View Item |