Gallo, Marco Andrea (2022) Addressing Bootstrapping Errors in Offline Reinforcement Learning with Ensembles. Bachelor's Thesis, Artificial Intelligence.
|
Text
thesis.pdf Download (14MB) | Preview |
|
Text
Toestemming.pdf Restricted to Registered users only Download (117kB) |
Abstract
Interest in Reinforcement Learning has surged in recent years on pair with its success stories. Nonetheless, deployment of Reinforcement Learning systems to real-world applications is still not on the scale of standard supervised learning models, which are able to exploit vast offline datasets. Through algorithms that can learn from data collected by other policies, off-policy Reinforcement Learning aims to improve the low sample efficiency of standard online algorithms and to better exploit existing offline datasets. One key challenge for off-policy value-based algorithms is the bootstrapping error \citep{kumar2019stabilizing}, where actions outside of the training data distribution incorrectly influence policy optimization. This error is exacerbated in the offline setting, and common solutions pertaining to uncertainty-based methods focus on bootstrap ensembles. This research seeks to assess whether the DQV algorithmic family \citep{sabatelli2020deep} benefits from the simple ensemble technique of Ensemble-DQN \citep{agarwal2020optimistic} for bootstrapping error control. Empirical studies are performed on two classic control OpenAI Gym environments, tracking the algorithms' accumulated reward and value estimates evolution during evaluation. Preliminary results found offline DQV and DQV-Max robust to bootstrapping errors due to their particular temporal difference updates. The pr
Item Type: | Thesis (Bachelor's Thesis) |
---|---|
Supervisor name: | Sabatelli, M. |
Degree programme: | Artificial Intelligence |
Thesis type: | Bachelor's Thesis |
Language: | English |
Date Deposited: | 27 Jul 2022 10:31 |
Last Modified: | 27 Jul 2022 10:31 |
URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/28178 |
Actions (login required)
View Item |