Javascript must be enabled for the correct page display

Using Machine Translated Lexicons for Hate Speech Classification on Dutch Covid-19 Twitter Data

Chen, Amber (2021) Using Machine Translated Lexicons for Hate Speech Classification on Dutch Covid-19 Twitter Data. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
BSc_Project_AmberChen_s3333302.pdf

Download (518kB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (118kB)

Abstract

Increased user population of social platforms, and increased time spent online per user during the COVID-19 pandemic, have resulted in the occurrence of more online hate speech. Efforts to create automated hate speech classification systems have mainly made use of English resources. For low-resource languages like Dutch, machine translation of existing lexicons is a possible workaround. A COVID-19 related Twitter data set was filtered using these lexicons. The data was annotated and classified using an SVM, which was trained on emotional models and anger intensity. TF-IDF showed that (translated) lexical entries were among the 20 highest scored n-grams. Inter-rater agreement was calculated after annotation and was found to be low. Of the calculated features, only anger intensity seemed useful in classification: the other features were scored several magnitudes smaller in terms of information gain. The low agreement and low information gain returned by the majority of the features showed in the results of the classification, where chance level accuracy scores were found for all systems.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Spenader, J.K. and Doornkamp, J.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 31 Aug 2021 14:05
Last Modified: 31 Aug 2021 14:05
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/25914

Actions (login required)

View Item View Item