Entjes, Robin (2021) Binarizing Word Embeddings using Straight-through Estimators. Bachelor's Thesis, Artificial Intelligence.
|
Text
BachThesisRobinFinal.pdf Download (309kB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (121kB) |
Abstract
Word embeddings are usually represented as real-valued vectors that contain semantic and syntactic information of a word. Words that are semantically similar have similar word embeddings. However, there are some downsides to using real-valued word embeddings. The calculations that are needed are computationally expensive and they require a large amount of memory. Therefore, we investigate the possibility of transforming real-valued word embeddings into binary-valued word embeddings. In this research we compare two different methods: an autoencoder that makes use of the Heaviside function and one autoencoder that was further extended with straight-through estimators. The two methods are compared using several standard word similarity tasks. Similar results are obtained for both binarization methods. However, the autoencoder using the straight-through estimators performed significantly better in the case of the SimLex dataset. It seems that it is possible to binarize real-valued embeddings without a great loss of semantic information.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Mostard, W. and Wiering, M.A. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 19 Jul 2021 13:25 |
Last Modified: | 19 Jul 2021 13:25 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/25319 |
Actions (login required)
View Item |