Javascript must be enabled for the correct page display

Deep Learning for Audiovisual Speech Recognition using the Correspondence Task

Groefsema, Marc (2019) Deep Learning for Audiovisual Speech Recognition using the Correspondence Task. Master's Thesis / Essay, Artificial Intelligence.

[img]
Preview
Text
mAI_2019_GroefsemaM.pdf

Download (2MB) | Preview
[img] Text
toestemming.pdf
Restricted to Registered users only

Download (139kB)

Abstract

We humans can observe the world around us in many ways. We can see things, hear sounds, smell a scent. In multimodal learning a model needs to understand such different types of input, it has to be capable of handling different types of data, e.g. audio samples, video frames or text. Considering robotics also sensor data streams might be usable, e.g. LIDAR data and joint states. This work is in the context of audiovisual speech recognition, using the datasets 'Lip reading in the wild' and 'Lip reading sentences in the wild'. It explores the usage of the correspondence task as a pretraining and transfer learning technique for word classification and sentence recognition for audiovisual speech recognition. In this correspondence task, the feature extraction modules of the network architectures are pretrained to classify whether an audio and video stream pairing matches the same video, or whether they originate from different videos. The usage of two different modality fusion techniques are considered for correspondence classification, namely classification based on the distance between modality features or based on concatenated modality features. Here the question is asked whether using the correspondence task results in useful features in a pretraining or transfer learning setting. Apart from this the performances using the different fusion methods are compared. Results suggest that using the correspondence task does indeed lead to useful feature extraction modules for a later classification task in audiovisual speech recognition.

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Wiering, M.A. and Schomaker, L.R.B.
Degree programme: Artificial Intelligence
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 29 Oct 2019
Last Modified: 30 Oct 2019 10:59
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/21154

Actions (login required)

View Item View Item