Reward-Reinforced Generative Adversarial Networks for Multi-agent Systems

Changgang Zheng; Shufan Yang; Juan Marcelo Parra-Ullauri; Antonio Garcia-Dominguez; Nelly Bencomo

doi:10.1109/tetci.2021.3082204

Reward-Reinforced Generative Adversarial Networks for Multi-agent Systems

Changgang Zheng, Shufan Yang, Juan Marcelo Parra-Ullauri, Antonio Garcia-Dominguez, Nelly Bencomo

Research output: Contribution to journal › Review article › peer-review

Abstract

Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of more than one agent. A number of nonlinear function approximation methods have been proposed for solving the Bellman equation, which describe a recursive format of an optimal policy. However, how to leverage the value distribution based on reinforcement learning, and how to improve the efficiency and efficacy of such systems remain a challenge. In this work, we developed a reward-reinforced generative adversarial network to represent the distribution of the value function, replacing the approximation of Bellman updates. We demonstrated our method is resilient and outperforms other conventional reinforcement learning methods. This method is also applied to a practical case study: maximising the number of user connections to autonomous airborne base stations in a mobile communication network. Our method maximises the data likelihood using a cost function under which agents have optimal learned behaviours. This reward-reinforced generative adversarial network can be used as a generic framework for multi-agent learning at the system level.

Original language	English
Pages (from-to)	479-488
Number of pages	10
Journal	IEEE Transactions on Emerging Topics in Computational Intelligence
Volume	6
Issue number	3
Early online date	8 Jun 2021
DOIs	https://doi.org/10.1109/tetci.2021.3082204
Publication status	Published - Jun 2022

Keywords

Base stations
GAN
Generative adversarial networks
Generators
Mathematical model
Multi-agent systems
Reinforcement learning
Training
airborne base station (ABS)
multi-agent
reinforcement learning
reward-reinforced GAN

Access to Document

10.1109/tetci.2021.3082204

https://arxiv.org/abs/2103.12192Licence: CC BY-NC-SA 4.0

Cite this

@article{1e45c98c225744b48ad9f2b52babbb76,

title = "Reward-Reinforced Generative Adversarial Networks for Multi-agent Systems",

abstract = "Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of more than one agent. A number of nonlinear function approximation methods have been proposed for solving the Bellman equation, which describe a recursive format of an optimal policy. However, how to leverage the value distribution based on reinforcement learning, and how to improve the efficiency and efficacy of such systems remain a challenge. In this work, we developed a reward-reinforced generative adversarial network to represent the distribution of the value function, replacing the approximation of Bellman updates. We demonstrated our method is resilient and outperforms other conventional reinforcement learning methods. This method is also applied to a practical case study: maximising the number of user connections to autonomous airborne base stations in a mobile communication network. Our method maximises the data likelihood using a cost function under which agents have optimal learned behaviours. This reward-reinforced generative adversarial network can be used as a generic framework for multi-agent learning at the system level.",

keywords = "Base stations, GAN, Generative adversarial networks, Generators, Mathematical model, Multi-agent systems, Reinforcement learning, Training, airborne base station (ABS), multi-agent, reinforcement learning, reward-reinforced GAN",

author = "Changgang Zheng and Shufan Yang and Parra-Ullauri, {Juan Marcelo} and Antonio Garcia-Dominguez and Nelly Bencomo",

year = "2022",

month = jun,

doi = "10.1109/tetci.2021.3082204",

language = "English",

volume = "6",

pages = "479--488",

number = "3",

}

TY - JOUR

T1 - Reward-Reinforced Generative Adversarial Networks for Multi-agent Systems

AU - Zheng, Changgang

AU - Yang, Shufan

AU - Parra-Ullauri, Juan Marcelo

AU - Garcia-Dominguez, Antonio

AU - Bencomo, Nelly

PY - 2022/6

Y1 - 2022/6

N2 - Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of more than one agent. A number of nonlinear function approximation methods have been proposed for solving the Bellman equation, which describe a recursive format of an optimal policy. However, how to leverage the value distribution based on reinforcement learning, and how to improve the efficiency and efficacy of such systems remain a challenge. In this work, we developed a reward-reinforced generative adversarial network to represent the distribution of the value function, replacing the approximation of Bellman updates. We demonstrated our method is resilient and outperforms other conventional reinforcement learning methods. This method is also applied to a practical case study: maximising the number of user connections to autonomous airborne base stations in a mobile communication network. Our method maximises the data likelihood using a cost function under which agents have optimal learned behaviours. This reward-reinforced generative adversarial network can be used as a generic framework for multi-agent learning at the system level.

AB - Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of more than one agent. A number of nonlinear function approximation methods have been proposed for solving the Bellman equation, which describe a recursive format of an optimal policy. However, how to leverage the value distribution based on reinforcement learning, and how to improve the efficiency and efficacy of such systems remain a challenge. In this work, we developed a reward-reinforced generative adversarial network to represent the distribution of the value function, replacing the approximation of Bellman updates. We demonstrated our method is resilient and outperforms other conventional reinforcement learning methods. This method is also applied to a practical case study: maximising the number of user connections to autonomous airborne base stations in a mobile communication network. Our method maximises the data likelihood using a cost function under which agents have optimal learned behaviours. This reward-reinforced generative adversarial network can be used as a generic framework for multi-agent learning at the system level.

KW - Base stations

KW - GAN

KW - Generative adversarial networks

KW - Generators

KW - Mathematical model

KW - Multi-agent systems

KW - Reinforcement learning

KW - Training

KW - airborne base station (ABS)

KW - multi-agent

KW - reinforcement learning

KW - reward-reinforced GAN

UR - https://ieeexplore.ieee.org/document/9448192

UR - http://www.scopus.com/inward/record.url?scp=85111003979&partnerID=8YFLogxK

U2 - 10.1109/tetci.2021.3082204

DO - 10.1109/tetci.2021.3082204

M3 - Review article

VL - 6

SP - 479

EP - 488

JO - IEEE Transactions on Emerging Topics in Computational Intelligence

JF - IEEE Transactions on Emerging Topics in Computational Intelligence

IS - 3

ER -

Reward-Reinforced Generative Adversarial Networks for Multi-agent Systems

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this