Javascript must be enabled for the correct page display

Backup policies for safe reinforcement learning in Mario

Folkers, Naut (2024) Backup policies for safe reinforcement learning in Mario. Bachelor's Thesis, Artificial Intelligence.


Download (978kB) | Preview
[img] Text
Folkers - ja.pdf
Restricted to Registered users only

Download (179kB)


Is it possible to make a reinforcement learning agent that does not make dangerous mistakes when learning? The safe reinforcement learning field aims to enhance safety during the learning phase of RL algorithms. Experiments are performed on the influence of a backup policy using a one-step actor-critic algorithm to train on Super Mario Bros (NES). The goal is to return to a safe state when a critical state is encountered. A backup policy takes over when a critical state is encountered. This policy has an alternate reward function that prioritizes safety over level progression. Challenges emerge due to the misfit of the one-step actor-critic and the Super Mario Bros environment, such as policy collapse and incapacity to learn. Due to the lack of progression in the learning phase, it is impossible to enclose the effect of backup policies on the learning phase. However, a distinction can be drawn between safety reward functions in the backup policy. Rewards based on penalizing proximity to threats show more potential for threat avoidance when compared to rewards empowering maximum distance to threats.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Cardenas Cartagena, J. D.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 03 May 2024 09:16
Last Modified: 03 May 2024 11:05

Actions (login required)

View Item View Item