Boer, G.A. (2010) An incremental approach to real-time hand pose estimation using the GPU. Master's Thesis / Essay, Computing Science.
|
Text
AI-MCS-2010-Boer.pdf - Published Version Download (6MB) | Preview |
Abstract
The research presented is part of a project called “Augmented Reality for 3D Multi-user Interaction,” or ARMI for short. The goal of project ARMI is to develop a system that allows multiple users to interact with an augmented reality using their hands as input. Interaction is performed without making any use of a mouse or keyboard. Also, no markers or gloves will be attached to the hands. The augmented reality is shared across the Internet so that multiple users can interact with the same environment. This allows both users to discuss and change a design of a building, for instance. The hands of the users are replicated and displayed as virtual models so that each user knows what the other one is pointing at. The augmented reality is displayed by making use of a head mounted display. A total of four different areas are researched for project ARMI. These are: the 3D interface to display the interactions and the augmented reality, the replication algorithm to communicate the changes made to the environment, a hand tracking algorithm that tracks the user’s hands in the video feed, a hand pose estimation (HPE) algorithm to determine the correct pose and position of the hand. The HPE algorithm is described in this thesis. To make sure there is enough processing power available, the HPE algorithm is run on the GPU. To make optimal use, the best way to perform calculations on the GPU is researched. Afterwards, the 3D hand model is made which will be used to match the model onto the real hand in the video feed. The total degrees of freedom (DOF) of a hand can be minimized to nine DOFs and five weak constraints. Also, the movement of the fingers is constrained so the hand model can also incorporate these constraints to decrease the total search space which in turn improves performance. The HPE algorithm receives the input from the hand tracker which marks each pixel that is part of the hand. The image is fed through a Sobel operator to retrieve all relevant edge information. Now, a search algorithm adjusts the hand model so that it matches the real hand in the video feed. This is done by subtracting the edges of the 3D model from the edges of the video feed. To determine whether certain settings result in a good fit, all the pixel information that is left after subtraction is summed together. This results in a value which describes the error of a particular setting. The search space, which contains all the settings, is searched through by an optimization algorithm to find the best fit as fast as possible. Three different optimization algorithms are evaluated: Secant method, Nelder-Mead, and Simulated Annealing. Each algorithm is tested to see if they are able to track a ball, an oblong, and a hand. The Simulated Annealing method gave the best results when compared to the other two methods. The final implementation of the system is able to successfully track the hand in the video feed. However, it is not able to accurately determine the complete pose of the hand. Also, it is not able to perform the esti- mation process in real-time which makes it hard to use for augmented reality. Many improvements can be made however. The input, speed, and estimation process can all be optimized. All in all, the research shows promise and has many possible applications.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Computing Science |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 07:31 |
Last Modified: | 15 Feb 2018 07:31 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/9181 |
Actions (login required)
View Item |