Javascript must be enabled for the correct page display

Investigating Mode-Switching and Reward Stream Separation in Hard-Exploration Problems

Bempt, Peter van den (2024) Investigating Mode-Switching and Reward Stream Separation in Hard-Exploration Problems. Bachelor's Thesis, Artificial Intelligence.


Download (2MB) | Preview
[img] Text
Restricted to Registered users only

Download (135kB)


This paper showcases the importance of segregating extrinsic and intrinsic reward streams in the context of a novel hard-exploration task. Various mode-switching algorithms (algorithms with distinct ’modes’ for exploration and exploitation) employing different mechanisms, are introduced, and their performance is evaluated based on discounted returns. Unlike the Q-learning agent that was used as a baseline, which failed to escape the environment’s local reward maximum, the mode-switching agents successfully navigated the environment by locating the treasure consistently. While some agents benefited from the separation of reward streams, it resulted in decreased performance for other agents, especially during training. The findings suggest that future research on the efficacy of reward stream separation should explore environments where pure exploration through intrinsic motivation is not the optimal strategy.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Sabatelli, M.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 27 Mar 2024 08:41
Last Modified: 27 Mar 2024 08:41

Actions (login required)

View Item View Item