Bempt, Peter van den (2024) Investigating Mode-Switching and Reward Stream Separation in Hard-Exploration Problems. Bachelor's Thesis, Artificial Intelligence.
|
Text
Investigating_Mode_Switching_and_Reward_Stream_Separation_in_Hard_Exploration_Problems.pdf Download (2MB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (135kB) |
Abstract
This paper showcases the importance of segregating extrinsic and intrinsic reward streams in the context of a novel hard-exploration task. Various mode-switching algorithms (algorithms with distinct ’modes’ for exploration and exploitation) employing different mechanisms, are introduced, and their performance is evaluated based on discounted returns. Unlike the Q-learning agent that was used as a baseline, which failed to escape the environment’s local reward maximum, the mode-switching agents successfully navigated the environment by locating the treasure consistently. While some agents benefited from the separation of reward streams, it resulted in decreased performance for other agents, especially during training. The findings suggest that future research on the efficacy of reward stream separation should explore environments where pure exploration through intrinsic motivation is not the optimal strategy.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Sabatelli, M. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 27 Mar 2024 08:41 |
Last Modified: | 27 Mar 2024 08:41 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/32162 |
Actions (login required)
View Item |