Desai, Nachiket (2022) DenseTransformer: Direct 6D OPE using self-attention on dense representations. Master's Thesis / Essay, Artificial Intelligence.
|
Text
mAI_2022_DesaiN.pdf Download (4MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (170kB) |
Abstract
This paper proposes a framework for single shot object pose estimation which leverages the power of transformers using joint RGB and point cloud features. Our key study is the use of transformers on a joint embedding that is produced using a bidirectional encoder-decoder network. This way we study the application of Transformers and self-attention on intermediary features produced by an independent network. Also, our approach uses point-cloud networks (PCNs) to extract geometric information and hence the model has lower complexity. Furthermore, it opens the path for future research in using transformers(which are currently outperforming previous mechanisms in a variety of image processing tasks) on unified representations learnt from different networks. In our model architecture, we first use the aforementioned encoder-decoder pair to create a joint representation, which also utilizes a mapping of 2D-3D features using KNN. The learnt representation is then fed to a transformer module to infer the spatial relevance of features in the joint embedding. This is followed by a set of simple convolutional modules to estimate class and pose. We evaluated our model on the LineMod and YCB datasets using the average distance metric (ADD/ADD(s)). The results show that the performance is competitive with existing state of the art.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Schomaker, L.R.B. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 25 Nov 2022 13:16 |
Last Modified: | 25 Nov 2022 13:16 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/28988 |
Actions (login required)
View Item |