Folkers, Naut (2024) Backup policies for safe reinforcement learning in Mario. Bachelor's Thesis, Artificial Intelligence.
|
Text
bAI_2024_NautFolkers.pdf Download (978kB) | Preview |
|
Text
Folkers - ja.pdf Restricted to Registered users only Download (179kB) |
Abstract
Is it possible to make a reinforcement learning agent that does not make dangerous mistakes when learning? The safe reinforcement learning field aims to enhance safety during the learning phase of RL algorithms. Experiments are performed on the influence of a backup policy using a one-step actor-critic algorithm to train on Super Mario Bros (NES). The goal is to return to a safe state when a critical state is encountered. A backup policy takes over when a critical state is encountered. This policy has an alternate reward function that prioritizes safety over level progression. Challenges emerge due to the misfit of the one-step actor-critic and the Super Mario Bros environment, such as policy collapse and incapacity to learn. Due to the lack of progression in the learning phase, it is impossible to enclose the effect of backup policies on the learning phase. However, a distinction can be drawn between safety reward functions in the backup policy. Rewards based on penalizing proximity to threats show more potential for threat avoidance when compared to rewards empowering maximum distance to threats.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Cardenas Cartagena, J. D. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 03 May 2024 09:16 |
Last Modified: | 03 May 2024 11:05 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/32342 |
Actions (login required)
View Item |