HandTalk: American sign language recognition by 3D-CNNs

Walczynska, Julia (2022) HandTalk: American sign language recognition by 3D-CNNs. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
Thesis_Walczynska_Julia.pdf
Download (805kB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (123kB)

Abstract

The goal of the project is to build a resource-efficient American sign language classification model suitable for potential deployment on mobile devices. This task is a particularly demanding visual recognition problem due to the nature of the sign language and the importance of each of its five parameters: hand shape, orientation, location, movement, and non-manual expression. In order to capture all of those properties, videos are used instead of images, therefore a 3D convolutional neural network needs to be used. On the other hand, such networks tend to be huge and slow. Recently, attempts to introduce resource-efficient 3D-CNNs have been made. This research investigates whether the resource-efficient MobileNetV2 architecture inflated to the 3D version by using the 3D filters as described in 'Resource Efficient 3D Convolutional Neural Networks' by Kopuklu et al. (2019) is a suitable model for the video-based sign language classification. The model was trained and tested on the Word-Level American Sign Language dataset and evaluated using accuracy, precision, recall, and F1-score. The top-1 and top-5 accuracy are compared to Pose-GRU, Pose-TGCN, VGG-GRU, and I3D. The model achieved top-1 accuracy of 51.51%, outperforming Pose-GRU and VGG-GRU. Furthermore, the top-5 accuracy was 86.32%, which is higher than achieved by other approaches.

Item Type:	Thesis (Bachelor's Thesis)
Supervisor name:	Lawrence, C.P. and Jaeger, H.
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	24 Aug 2022 09:50
Last Modified:	24 Aug 2022 09:50
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/28491

Actions (login required)

View Item