Communicating Intention in Decentralized Multi-Agent Multi-Objective Reinforcement Learning Systems

Emami Alagha, Hirad (2019) Communicating Intention in Decentralized Multi-Agent Multi-Objective Reinforcement Learning Systems. Master's Thesis / Essay, Artificial Intelligence.

Preview

Text
mAI_2019_HiradEmamiAlagha.pdf
Download (4MB) | Preview

Text
toestemming.pdf
Restricted to Registered users only
Download (119kB)

Abstract

In the past few decades, many studies have proposed different approaches to multi-agent reinforcement learning (MARL) systems and have found application in a variety of domains such as swarm robotics coordination, traffic-light control, and supply chain management. These methods usually fall under the two categories of centralized and decentralized systems, with each approach having certain advantages at a cost of some drawbacks. In the former approach, a single top-level centralized controller is used for making the action decisions of every agent in the world, whereas in the latter method, each individual agent is responsible for choosing its actions and learning its own individual behavior policy using the local rewards received during the process. While the decentralized systems can avoid the scalability problem of the centralized monolithic approach in complex problems, coordinating the agents in order to produce a coherent collective behavior that satisfies the global criteria still remains a challenge. In this thesis, we compared both the centralized and decentralized approaches to MARL systems in coordinating multiple agents in a shared 2-dimensional environment that consists of two separate goal locations. We explored several approaches of communication in the decentralized systems to enhance the coordination among the independent learners and assist them to overcome their limited observation of the environment. Five variations of decentralized MARL systems are proposed that differ in the range of communication they provide between the independent agents. In order to evaluate the effect of the communication mechanisms and compare the performance of the MARL systems under different conditions, four experiment scenarios were designed in which the difficulty of the task was altered using different world configurations and limited vision of independent learners. The results of our experiments show that the communication mechanisms can substantially enhance the performance of the baseline decentralized MARL system in the complex setups and also accelerate the convergence in the average scenarios. Our method of enabling the agents to shape and communicate their intention using multi-objective reinforcement learning managed to demonstrate a faster learning process than the centralized MARL system in the complex scenarios, even with limited observability of the environment. Similarly, policy-sharing outperforms the centralized MARL system in scenarios with a large number of agents and needs significantly shorter training sessions.

Item Type:	Thesis (Master's Thesis / Essay)
Supervisor name:	Wiering, M.A.
Degree programme:	Artificial Intelligence
Thesis type:	Master's Thesis / Essay
Language:	English
Date Deposited:	28 Mar 2019
Last Modified:	29 Mar 2019 13:15
URI:	https://fse.studenttheses.ub.rug.nl/id/eprint/19312

Actions (login required)

View Item