Javascript must be enabled for the correct page display

Next-Word Prediction in Low-Resource Languages: A Study on Hrusso Aka

Dima, Alina Elena (2024) Next-Word Prediction in Low-Resource Languages: A Study on Hrusso Aka. Master's Thesis / Essay, Artificial Intelligence.

[img]
Preview
Text
mAI2024DimaAE.pdf

Download (1MB) | Preview
[img] Text
Toestemming.pdf
Restricted to Registered users only

Download (181kB)

Abstract

With most of the languages worldwide being endangered or potentially vulnerable, it is essential to develop technological tools that facilitate the preservation of these languages. This project focuses on Hrusso Aka, an endangered isolate language spoken in Northeast India. The aim is to contribute to the revitalisation of this language by developing a next-word prediction model for mobile device keyboards. Due to the limited written resources available in Hrusso Aka, the major challenge of this study is data scarcity. To develop a model that can learn effectively with such limited data, we experimented with models of varying complexity. This involved implementing and comparing an n-gram model, a recurrent neural network (RNN) model, and a compact transformer model. The models were evaluated on two criteria, namely, the accuracy of predicting the next word and the inference time, as the model should function in real time for mobile keyboard use. Based on our assessment, the GRU-based RNN model significantly outperformed the n-gram and transformer models on both evaluation criteria. The GRU-based RNN achieved a top-3 accuracy of 19.94%, and an inference time of 0.0006s, compared to the n-gram and transformer models which were slower and achieved a top-3 accuracy of less than 10%. These findings suggest that, although transformer models are state-of-the-art solutions for many high-resource tasks, simpler models appear to be better suited for low-resource languages.

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Jones, S.M. and Tashu, T. M.
Degree programme: Artificial Intelligence
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 28 Nov 2024 09:01
Last Modified: 28 Nov 2024 09:01
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/34443

Actions (login required)

View Item View Item