Learning to Grasp Objects with Reinforcement Learning

Timmers, Rik (2018) Learning to Grasp Objects with Reinforcement Learning. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
mAI_2018_TimmersRP.pdf
Download (3MB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (101kB)

Abstract

In this project we will use reinforcement learning, the CACLA algorithm, to let an agent learn to control a robotic arm. Inspired by domestic service robots that have to perform multiple complex tasks, manipulation is only a small part of it. Using neural networks an agent should be able to learn to complete a manipulation task without having to calculate paths and grasping points. We will be using a 6 degree of freedom robotica arm, Mico, and make use of a simulator called V-REP to perform the experiments. We compare the results to a traditional simple inverse kinematic solver to see if there is a gain in speed, accuracy or robustness. Whilst most agents use one neural network to perform their task, we will experiment with different architectures, namely the amount of neural networks that each control a sub-set of the joints, to see if this can improve results. Whilst for reinforcement learning exploration is very important we test two different exploration methods; Gaussian exploration and Ornstein-Uhlenbeck process exploration, to see if there is any influence in the training. We experimented first with letting the end effector of the arm move to a certain position without grasping an object. It was shown that when using only 1 joint learning is very easy, but when controlling more joints the problem of simply going to a single location becomes more difficult to solve. While adding more training iterations can improve results, it also takes a lot longer to train the neural networks. By showing a pre training stage consisting of calculating the forward kinematics without relying on any physics simulation to create the input state of the agent, we can create more examples to learn from and improve results and decrease the learning time. However when trying to grasp objects the extra pre training stage does not help at all. By increasing the training iterations we can achieve some good results and the agent is able to learn to grasp an object. However when using multiple networks to control a sub-set of joints we can improve on the results, even reaching a 100\% success rate for both exploration methods, not only showing that multiple networks can outperform a single network, also that exploration does not influence training all that much. The downside is that training takes a very long time. Whilst it does not outperform the inverse kinematic solver we do have to take into account that the setup was relatively easy, therefore making it very easy for the inverse kinematic solver.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Wiering, M.A. and Schomaker, L.R.B.
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	26 Mar 2018
Last Modified:	27 Mar 2018 13:40
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/16612

Actions (login required)

View Item