Walczynska, Julia (2022) HandTalk: American sign language recognition by 3D-CNNs. Bachelor's Thesis, Artificial Intelligence.
|
Text
Thesis_Walczynska_Julia.pdf Download (805kB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (123kB) |
Abstract
The goal of the project is to build a resource-efficient American sign language classification model suitable for potential deployment on mobile devices. This task is a particularly demanding visual recognition problem due to the nature of the sign language and the importance of each of its five parameters: hand shape, orientation, location, movement, and non-manual expression. In order to capture all of those properties, videos are used instead of images, therefore a 3D convolutional neural network needs to be used. On the other hand, such networks tend to be huge and slow. Recently, attempts to introduce resource-efficient 3D-CNNs have been made. This research investigates whether the resource-efficient MobileNetV2 architecture inflated to the 3D version by using the 3D filters as described in 'Resource Efficient 3D Convolutional Neural Networks' by Kopuklu et al. (2019) is a suitable model for the video-based sign language classification. The model was trained and tested on the Word-Level American Sign Language dataset and evaluated using accuracy, precision, recall, and F1-score. The top-1 and top-5 accuracy are compared to Pose-GRU, Pose-TGCN, VGG-GRU, and I3D. The model achieved top-1 accuracy of 51.51%, outperforming Pose-GRU and VGG-GRU. Furthermore, the top-5 accuracy was 86.32%, which is higher than achieved by other approaches.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Lawrence, C.P. and Jaeger, H. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 24 Aug 2022 09:50 |
Last Modified: | 24 Aug 2022 09:50 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/28491 |
Actions (login required)
View Item |