Javascript must be enabled for the correct page display

Visually Grounded Large Language Models (VGLLMs) for Robotic Manipulation in Unknown Environments

Lopez, Juan (2024) Visually Grounded Large Language Models (VGLLMs) for Robotic Manipulation in Unknown Environments. Master's Thesis / Essay, Artificial Intelligence.

[img]
Preview
Text
Visually-Grounded-Large-Language-Models-VGLLMs-for-Robotic-Manipulation-in-Unknown-Environments.pdf

Download (42MB) | Preview
[img] Text
Toestemming.pdf
Restricted to Registered users only

Download (223kB)

Abstract

This thesis investigates the potential of Visually Grounded Large Language Models (VGLLMs) for few-shot robotic tabletop manipulation tasks. Six models were evaluated on their zero-shot grounding capabilities on a subset OCID, HOTS and a novel PyBullet dataset from a simulation environment. Uniquely, this study examines the ability of five VGLLMs to generate grounded robotic API calls from diverse natural language instructions referencing objects and target locations, via few-shot prompting of interleaved images, instructions, and expected grounded API calls. A novel robotic pipeline is proposed that unifies the traditionally separate stages of visual grounding and code generation within a single VGLLM pass, enabling users to instruct a robot to perform pick-and-place actions using natural language. The findings show that while most models can generate visually grounded text consistently, there is room for improvement on their performance. Zero-shot grounding achieved a maximum Intersection over Union (IoU) score of approximately 50% on the datasets. Few-shot performance, while enabling grounded API call generation, achieved a maximum IoU of around 30%. In the simulated environment, the best performing model achieved a 30% success rate for grasping the correct object. Despite current limitations, this work highlights the potential of VGLLMs to unify robotic pipelines and motivates future research into fine-tuning these models with grounded robotic data.

Item Type: Thesis (Master's Thesis / Essay)
Supervisor name: Mohades Kasaei, S.H. and Tashu, T. M. and Tziafas, G.T.
Degree programme: Artificial Intelligence
Thesis type: Master's Thesis / Essay
Language: English
Date Deposited: 11 Dec 2024 06:46
Last Modified: 11 Dec 2024 06:46
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/34487

Actions (login required)

View Item View Item