Investigating Mode-Switching and Reward Stream Separation in Hard-Exploration Problems

Bempt, Peter van den (2024) Investigating Mode-Switching and Reward Stream Separation in Hard-Exploration Problems. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
Investigating_Mode_Switching_and_Reward_Stream_Separation_in_Hard_Exploration_Problems.pdf
Download (2MB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (135kB)

Abstract

This paper showcases the importance of segregating extrinsic and intrinsic reward streams in the context of a novel hard-exploration task. Various mode-switching algorithms (algorithms with distinct ’modes’ for exploration and exploitation) employing different mechanisms, are introduced, and their performance is evaluated based on discounted returns. Unlike the Q-learning agent that was used as a baseline, which failed to escape the environment’s local reward maximum, the mode-switching agents successfully navigated the environment by locating the treasure consistently. While some agents benefited from the separation of reward streams, it resulted in decreased performance for other agents, especially during training. The findings suggest that future research on the efficacy of reward stream separation should explore environments where pure exploration through intrinsic motivation is not the optimal strategy.

Item Type:	Thesis (Bachelor's Thesis)
Supervisor name:	Sabatelli, M.
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	27 Mar 2024 08:41
Last Modified:	27 Mar 2024 08:41
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/32162

Actions (login required)

View Item