Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization

Bick, Daniel (2021) Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
mAI_2021_BickD.pdf
Download (636kB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (130kB)

Abstract

Reinforcement Learning (RL), and these days particularly Deep Reinforcement Learning (DRL), is concerned with the development, study, and application of algorithms that are designed to accomplish some arbitrary task by learning a decision-making strategy that aims for maximizing a cumulative performance measure. While this class of machine learning algorithms has become increasingly successful on a variety of tasks over the last years, some of the algorithms developed in this field are sub-optimally documented. One example of a DRL algorithm being sub-optimally documented is Proximal Policy Optimization (PPO), which is a so-called model-free policy gradient method (PGM). Since PPO is a state-of-the-art representative of the important class of PGMs, but can hardly be understood from only consulting the paper having introduced it, this report aims for explaining PPO in detail. Thereby, the report shines a light on many concepts generalizing to the wider field of PGMs. Also, a reference implementation of PPO has been developed, which will shortly be introduced and evaluated. Lastly, this report examines the limitations of PPO and quickly touches upon the topic of whether DRL might lead to the emergence of General Artificial Intelligence in the future.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Jaeger, H. and Wiering, M.A.
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	02 Sep 2021 11:24
Last Modified:	02 Sep 2021 11:24
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/25709

Actions (login required)

View Item