Javascript must be enabled for the correct page display

Binarizing Word Embeddings using Straight-through Estimators

Entjes, Robin (2021) Binarizing Word Embeddings using Straight-through Estimators. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
BachThesisRobinFinal.pdf

Download (309kB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (121kB)

Abstract

Word embeddings are usually represented as real-valued vectors that contain semantic and syntactic information of a word. Words that are semantically similar have similar word embeddings. However, there are some downsides to using real-valued word embeddings. The calculations that are needed are computationally expensive and they require a large amount of memory. Therefore, we investigate the possibility of transforming real-valued word embeddings into binary-valued word embeddings. In this research we compare two different methods: an autoencoder that makes use of the Heaviside function and one autoencoder that was further extended with straight-through estimators. The two methods are compared using several standard word similarity tasks. Similar results are obtained for both binarization methods. However, the autoencoder using the straight-through estimators performed significantly better in the case of the SimLex dataset. It seems that it is possible to binarize real-valued embeddings without a great loss of semantic information.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Mostard, W. and Wiering, M.A.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 19 Jul 2021 13:25
Last Modified: 19 Jul 2021 13:25
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/25319

Actions (login required)

View Item View Item