Reinforcement learning and games : temporal-difference algorithms for gameplay and their performance on playing 5x5 Go

Ekker, R.J. (2003) Reinforcement learning and games : temporal-difference algorithms for gameplay and their performance on playing 5x5 Go. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
scriptie-reindert-jan-ekker.pdf - Published Version
Download (553kB) | Preview

Abstract

Reinforcement learning is applied to computer-based playing of 5x5 Go. We have found that incorporating a model of the opponent into the total dynamics of gameplay using the TD-mu algorithm significantly increases the performance in comparison to other TD-learning algorithms, such as TD-leaf. However, the choice of a proper feature set and search topology plays an essential role. Experimental evaluation was performed using WALLY, a simple public-domain Go-playing program and the program GnuGP to score games and determine the winner. Using a 2-ply search, a neural network could be trained fo beat WALLY on each occasion, while showing its ability to learn. By starting every game with two random moves, a proper exploration of conditions was enforced. A multi-layer perceptron with 32 inputs, 75 hidden neurons and a single output neuron was used for game state evaluation. Training was done using RPROP and Baird's residuals algorithm.

Item Type:	Thesis (Master's Thesis / Essay)
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	15 Feb 2018 07:29
Last Modified:	15 Feb 2018 07:29
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/8663

Actions (login required)

View Item