Krol, Daan and Brandenburg, Jeroen van (2020) Q-learning adaptations in the game Othello. Bachelor's Thesis, Artificial Intelligence.
|
Text
AI_BA_2020_Daan_Krol_and_Jeroen_van_Brandenburg.pdf Download (375kB) | Preview |
|
Text
Toestemming.pdf Restricted to Registered users only Download (94kB) |
Abstract
Reinforcement learning algorithms are widely used algorithms concerning action selection to maximize the reward for a specific situation. Q-learning is such an algorithm. It estimates the quality of performing an action in a certain state. These estimations are continuously updated with each experience. In this paper we compare different adaptations to the Q-learning algorithm to learn an agent play the board game Othello. We discuss the use of a second estimator in Double Q-learning, the addition of a V-value function in QV- and QV2-learning, and we consider the on-policy variant of Q-learning called SARSA. A multilayer perceptron is used as a function approximator and is compared to the use of a convolutional neural network. Results indicate that SARSA, QV- and QV2-learning perform better than Q-learning. The addition of a second estimator in Double Q-learning does not seem to improve the performance of Q-learning. SARSA and QV2-learning converge slower and they struggle to escape local minima, while QV-learning converges faster. Results show that the multilayer perceptron outperforms the convolutional neural network.
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Wiering, M.A. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 10 Aug 2020 07:04 |
Last Modified: | 10 Aug 2020 07:04 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/23036 |
Actions (login required)
View Item |