Bick, Daniel (2021) Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization. Master's Thesis / Essay, Artificial Intelligence.
|
Text
mAI_2021_BickD.pdf Download (636kB) | Preview |
|
Text
toestemming.pdf Restricted to Registered users only Download (130kB) |
Abstract
Reinforcement Learning (RL), and these days particularly Deep Reinforcement Learning (DRL), is concerned with the development, study, and application of algorithms that are designed to accomplish some arbitrary task by learning a decision-making strategy that aims for maximizing a cumulative performance measure. While this class of machine learning algorithms has become increasingly successful on a variety of tasks over the last years, some of the algorithms developed in this field are sub-optimally documented. One example of a DRL algorithm being sub-optimally documented is Proximal Policy Optimization (PPO), which is a so-called model-free policy gradient method (PGM). Since PPO is a state-of-the-art representative of the important class of PGMs, but can hardly be understood from only consulting the paper having introduced it, this report aims for explaining PPO in detail. Thereby, the report shines a light on many concepts generalizing to the wider field of PGMs. Also, a reference implementation of PPO has been developed, which will shortly be introduced and evaluated. Lastly, this report examines the limitations of PPO and quickly touches upon the topic of whether DRL might lead to the emergence of General Artificial Intelligence in the future.
Item Type: | Thesis (Master's Thesis / Essay) |
---|---|
Supervisor name: | Jaeger, H. and Wiering, M.A. |
Degree programme: | Artificial Intelligence |
Thesis type: | Master's Thesis / Essay |
Language: | English |
Date Deposited: | 02 Sep 2021 11:24 |
Last Modified: | 02 Sep 2021 11:24 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/25709 |
Actions (login required)
View Item |