Region enhanced neural Q-learning in partially observable Markov decision processes

Kooi, T (2010) Region enhanced neural Q-learning in partially observable Markov decision processes. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
BachelorThesis_Tkooi.pdf - Published Version
Download (1MB) | Preview

Abstract

To get a robot to perform tasks autonomously, the robot has to plan its behavior and make decisions based on the input it receives. Unfortunately, contemporary robot sensors and actuators are subject to noise, rendering optimal decision making a stochastic process. To model this process, partially observable Markov decision processes (POMDPs) can be applied. In this paper we introduce the RENQ algorithm, a new POMDP algorithm that combines neural networks for estimating Q-values with the construction of a spatial pyramid over the state space. RENQ essentially uses region-based belief vectors together with state-based belief vectors, and these are used as inputs to the neural network trained with Q-learning. We compare RENQ to Qmdp and Perseus, two state-of-the-art algorithms for approximately solving model-based POMDPs. The results on three different maze navigation tasks indicate that RENQ outperforms Perseus on all problems and outperforms Qmdp if the problem becomes larger.

Item Type:	Thesis (Bachelor's Thesis)
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	15 Feb 2018 07:31
Last Modified:	15 Feb 2018 07:31
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/9133

Actions (login required)

View Item