The effect of multimodality on the performance of a convolutional neural network.

Rogaar, D. (2017) The effect of multimodality on the performance of a convolutional neural network. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
AI_BA_2017_s2393344.pdf - Published Version
Download (551kB) | Preview

Text
toestemming.pdf - Other
Restricted to Backend only
Download (77kB)

Abstract

A classical convolutional neural network is connected to two inputs of different modality (language, images, music, etc.) to observe the effect on accuracy. One modality used is iconographic images, representative of some concept. The second modality used is titles for the same concept, converted to images. The modalities are both processed using a convolutional neural network and a fully connected representation layer, with different parameters. Connecting the convolutional networks to a fully connected classification layer allows comparing the resulting bimodal classifier with control-condition networks. Initial results show that overfit representations disabled transfer learning which was required to connect the modalities. The perceived overfitting was counteracted using dropout, after which the bimodal network has significantly more accuracy (98.6%) than the control-conditions (93.6% on images and 93.0% on words) when given both inputs.

Item Type:	Thesis (Bachelor's Thesis)
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	15 Feb 2018 08:31
Last Modified:	15 Feb 2018 08:31
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/15776

Actions (login required)

View Item