Javascript must be enabled for the correct page display

Incorporating a Distance Metric to Induce Safe Behavior in Super Mario Bros using Deep Reinforcement Learning

Milenov, Viktor (2024) Incorporating a Distance Metric to Induce Safe Behavior in Super Mario Bros using Deep Reinforcement Learning. Bachelor's Thesis, Artificial Intelligence.

[img]
Preview
Text
bAI2024ViktorM.pdf

Download (1MB) | Preview
[img] Text
Milenov - ja.pdf
Restricted to Registered users only

Download (178kB)

Abstract

Safe Reinforcement Learning (Safe RL) is a sub-branch of machine learning and data-driven algorithms that strives to guarantee safe performance in a system while optimizing its performance efficiency. To experiment with such algorithms and models we use video games, specifically Super Mario Bros, which offer environments where faulty behavior does not lead to serious real life consequences, allowing for the modification and improvement of such algorithms and models. We employ the Actor-Critic method because of its effective and stable training procedure, which permits learning a value function and a policy at the same time and directly modifies the gradient descent direction to guarantee system safety. Our model consists of an actor network and an ensemble of critic networks to obtain a more accurate value function approximation. Furthermore, we incorporate a distance metric that stands for the distance between our agent and the closest danger in the environment. We compare the performance of this safer model with a baseline model to assess if the distance metric indirectly induces safe behavior of the agent in the environment. We are investigating ways to improve an agent’s training to make it more ”safe”.

Item Type: Thesis (Bachelor's Thesis)
Supervisor name: Cardenas Cartagena, J. D.
Degree programme: Artificial Intelligence
Thesis type: Bachelor's Thesis
Language: English
Date Deposited: 03 May 2024 09:08
Last Modified: 03 May 2024 11:04
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/32365

Actions (login required)

View Item View Item