Ekker, R.J. (2003) Reinforcement learning and games : temporal-difference algorithms for gameplay and their performance on playing 5x5 Go. Master's Thesis / Essay, Artificial Intelligence.
|
Text
scriptie-reindert-jan-ekker.pdf - Published Version Download (553kB) | Preview |
Abstract
Reinforcement learning is applied to computer-based playing of 5x5 Go. We have found that incorporating a model of the opponent into the total dynamics of gameplay using the TD-mu algorithm significantly increases the performance in comparison to other TD-learning algorithms, such as TD-leaf. However, the choice of a proper feature set and search topology plays an essential role. Experimental evaluation was performed using WALLY, a simple public-domain Go-playing program and the program GnuGP to score games and determine the winner. Using a 2-ply search, a neural network could be trained fo beat WALLY on each occasion, while showing its ability to learn. By starting every game with two random moves, a proper exploration of conditions was enforced. A multi-layer perceptron with 32 inputs, 75 hidden neurons and a single output neuron was used for game state evaluation. Training was done using RPROP and Baird's residuals algorithm.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 15 Feb 2018 07:29 |
Last Modified: | 15 Feb 2018 07:29 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/8663 |
Actions (login required)
View Item |