Jawahier, Arjan (2019) Dynamic Label Propagation in a Lifelong Machine Learning Context. Bachelor's Thesis, Artificial Intelligence.
|
Text
AI_BA_2019_ArjanJawahier.pdf Download (752kB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (138kB) |
Abstract
Transcription of historical handwritten documents is an important domain within machine learning. However, it is far from a solved problem. Monk is an engine used for transcribing handwritten texts, and the spotting of words in those texts. It uses the input from volunteers to train its classifiers, getting better with each label. However, the labeling process can still be improved. In this thesis, a new method of acquiring word labels is developed and tested. A convolutional neural network is created to rate lists of word instances on their visual structure. This rating is then used together with a sampling technique called the Metropolis-Hastings algorithm to find out which word instances to label for the biggest increase in average F1-score. Using this technique, probability distributions are created and subsequently used in sampling and labeling simulations. The convolutional neural network has learned how to rate lists of word instances well, with a reasonable mean squared error and a low standard deviation. The results of the sampling and labeling simulations show that using the generated distributions is not very effective. Uniform random guessing is better than using the generated distributions.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Schomaker, L.R.B. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 29 Jul 2019 |
Last Modified: | 30 Jul 2019 09:52 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/20469 |
Actions (login required)
View Item |