Chen, Amber (2021) Using Machine Translated Lexicons for Hate Speech Classification on Dutch Covid-19 Twitter Data. Bachelor's Thesis, Artificial Intelligence.
|
Text
BSc_Project_AmberChen_s3333302.pdf Download (518kB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (118kB) |
Abstract
Increased user population of social platforms, and increased time spent online per user during the COVID-19 pandemic, have resulted in the occurrence of more online hate speech. Efforts to create automated hate speech classification systems have mainly made use of English resources. For low-resource languages like Dutch, machine translation of existing lexicons is a possible workaround. A COVID-19 related Twitter data set was filtered using these lexicons. The data was annotated and classified using an SVM, which was trained on emotional models and anger intensity. TF-IDF showed that (translated) lexical entries were among the 20 highest scored n-grams. Inter-rater agreement was calculated after annotation and was found to be low. Of the calculated features, only anger intensity seemed useful in classification: the other features were scored several magnitudes smaller in terms of information gain. The low agreement and low information gain returned by the majority of the features showed in the results of the classification, where chance level accuracy scores were found for all systems.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Spenader, J.K. and Doornkamp, J. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 31 Aug 2021 14:05 |
Last Modified: | 31 Aug 2021 14:05 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/25914 |
Actions (login required)
View Item |