Binarizing Word Embeddings using Straight-through Estimators

Entjes, Robin (2021) Binarizing Word Embeddings using Straight-through Estimators. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
BachThesisRobinFinal.pdf
Download (309kB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (121kB)

Abstract

Word embeddings are usually represented as real-valued vectors that contain semantic and syntactic information of a word. Words that are semantically similar have similar word embeddings. However, there are some downsides to using real-valued word embeddings. The calculations that are needed are computationally expensive and they require a large amount of memory. Therefore, we investigate the possibility of transforming real-valued word embeddings into binary-valued word embeddings. In this research we compare two different methods: an autoencoder that makes use of the Heaviside function and one autoencoder that was further extended with straight-through estimators. The two methods are compared using several standard word similarity tasks. Similar results are obtained for both binarization methods. However, the autoencoder using the straight-through estimators performed significantly better in the case of the SimLex dataset. It seems that it is possible to binarize real-valued embeddings without a great loss of semantic information.

Item Type:	Thesis (Bachelor's Thesis)
Supervisor name:	Mostard, W. and Wiering, M.A.
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	19 Jul 2021 13:25
Last Modified:	19 Jul 2021 13:25
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/25319

Actions (login required)

View Item