Lopez, Juan (2024) Visually Grounded Large Language Models (VGLLMs) for Robotic Manipulation in Unknown Environments. Master's Thesis / Essay, Artificial Intelligence.
|
Text
Visually-Grounded-Large-Language-Models-VGLLMs-for-Robotic-Manipulation-in-Unknown-Environments.pdf Download (42MB) | Preview |
|
Text
Toestemming.pdf Restricted to Registered users only Download (223kB) |
Abstract
This thesis investigates the potential of Visually Grounded Large Language Models (VGLLMs) for few-shot robotic tabletop manipulation tasks. Six models were evaluated on their zero-shot grounding capabilities on a subset OCID, HOTS and a novel PyBullet dataset from a simulation environment. Uniquely, this study examines the ability of five VGLLMs to generate grounded robotic API calls from diverse natural language instructions referencing objects and target locations, via few-shot prompting of interleaved images, instructions, and expected grounded API calls. A novel robotic pipeline is proposed that unifies the traditionally separate stages of visual grounding and code generation within a single VGLLM pass, enabling users to instruct a robot to perform pick-and-place actions using natural language. The findings show that while most models can generate visually grounded text consistently, there is room for improvement on their performance. Zero-shot grounding achieved a maximum Intersection over Union (IoU) score of approximately 50% on the datasets. Few-shot performance, while enabling grounded API call generation, achieved a maximum IoU of around 30%. In the simulated environment, the best performing model achieved a 30% success rate for grasping the correct object. Despite current limitations, this work highlights the potential of VGLLMs to unify robotic pipelines and motivates future research into fine-tuning these models with grounded robotic data.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Mohades Kasaei, S.H. and Tashu, T. M. and Tziafas, G.T. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 11 Dec 2024 06:46 |
Last Modified: | 11 Dec 2024 06:46 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/34487 |
Actions (login required)
View Item |