Addressing Bootstrapping Errors in Offline Reinforcement Learning with Ensembles

Gallo, Marco Andrea (2022) Addressing Bootstrapping Errors in Offline Reinforcement Learning with Ensembles. Bachelor's Thesis, Artificial Intelligence.

Preview

Text
thesis.pdf
Download (14MB) | Preview

Text
Toestemming.pdf
Restricted to Registered users only
Download (117kB)

Abstract

Interest in Reinforcement Learning has surged in recent years on pair with its success stories. Nonetheless, deployment of Reinforcement Learning systems to real-world applications is still not on the scale of standard supervised learning models, which are able to exploit vast offline datasets. Through algorithms that can learn from data collected by other policies, off-policy Reinforcement Learning aims to improve the low sample efficiency of standard online algorithms and to better exploit existing offline datasets. One key challenge for off-policy value-based algorithms is the bootstrapping error \citep{kumar2019stabilizing}, where actions outside of the training data distribution incorrectly influence policy optimization. This error is exacerbated in the offline setting, and common solutions pertaining to uncertainty-based methods focus on bootstrap ensembles. This research seeks to assess whether the DQV algorithmic family \citep{sabatelli2020deep} benefits from the simple ensemble technique of Ensemble-DQN \citep{agarwal2020optimistic} for bootstrapping error control. Empirical studies are performed on two classic control OpenAI Gym environments, tracking the algorithms' accumulated reward and value estimates evolution during evaluation. Preliminary results found offline DQV and DQV-Max robust to bootstrapping errors due to their particular temporal difference updates. The pr

Item Type:	Thesis (Bachelor's Thesis)
Supervisor name:	Sabatelli, M.
Degree programme:	Artificial Intelligence
Thesis type:	Bachelor's Thesis
Language:	English
Date Deposited:	27 Jul 2022 10:31
Last Modified:	27 Jul 2022 10:31
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/28178

Actions (login required)

View Item