Alkemade, H.C. (2016) Dutch factuality classification: Using machine translation to create a Dutch version of FactBank. Bachelor's Thesis, Artificial Intelligence.
|
Text
AI_BA_2016_HARMKEALKMEADE.pdf - Published Version Download (219kB) | Preview |
|
Text
Toestemming.pdf - Other Restricted to Backend only Download (551kB) |
Abstract
People refer in texts to events that may or may not have happened. Information about how the writer presents an event is called event factuality. Factuality is separated in certainty and polarity. FactBank is an English corpus consisting of events and their corresponding factuality values. There is no such corpus for Dutch, even though this information could be interesting to have. In this project, TechoMT and Google Translate are used to create a Dutch version of FactBank. Sentences are represented by a word vector using frequency information combined with syntactical distance to represent scope. A stochastic gradient learning routine is trained to make a classifier for Dutch. The classifier is tested on a small Dutch corpus consisting of factuality values. The results show that the certainty classification does not perform better than majority-class baseline. Polarity classification however, does perform better. An F-measure of 0.98 is achieved.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 15 Feb 2018 08:10 |
Last Modified: | 15 Feb 2018 08:10 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/13669 |
Actions (login required)
View Item |