Groot, Tobias (2024) Confidence is Key: Uncertainty Estimation in Large Language Models and Vision Language Models. Bachelor's Thesis, Artificial Intelligence.
Text
tobias 11 mrt.pdf Restricted to Registered users only Download (134kB) |
||
|
Text
bAI_2024_TobiasGroot.pdf Download (3MB) | Preview |
Abstract
Large Language Models (LLMs) have revolutionized the field of artificial intelligence by their ability to understand and generate human-like text. Since these LLMs are deployed worldwide, ensuring their reliability is crucial. Uncertainty estimation has shown to be a promising method for evaluating the reliability of predictions from machine learning algorithms. Despite its potential, little research has been conducted in the domain of uncertainty estimation in LLMs. This paper aims to contribute to the literature on this topic by evaluating the ability of LLMs to estimate their uncertainty in natural language processing (NLP) tasks. Furthermore, this paper extends the topic by evaluating the newly released Vision Language Models (VLMs) and their ability to estimate their uncertainty in an image recognition task. To investigate this, four LLMs are tested on three different NLP tasks. For all tasks, the models are prompted to express their confidence level for each answer. Additionally, two VLMs are similarly tested on a novel image recognition dataset. The results show that both the LLMs and the VLMs have a high calibration error and are overconfident most of the time, indicating a poor capability for uncertainty estimation. The findings of this study provide a foundational basis for future research in enhancing uncertainty estimation methods within LLMs.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Valdenegro Toro, M.A. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 11 Mar 2024 14:00 |
Last Modified: | 11 Mar 2024 14:35 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/32044 |
Actions (login required)
View Item |