Multi-agent reinforcement learning for cost-aware collaborative task execution in energy-harvesting D2D networks

Binbin Huang, Xiao Liu, Shangguang Wang, Linxuan Pan, Victor Chang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


In device-to-device (D2D) networks, multiple resource-limited mobile devices cooperate with one another to execute computation tasks. As the battery capacity of mobile devices is limited, the computation tasks running on the mobile devices will terminate once the battery is dead. In order to achieve sustainable computation, energy-harvesting technology has been introduced into D2D networks. At present, how to make multiple energy harvesting mobile devices work collaboratively to minimize the long-term system cost for task execution under limited computing, network and battery capacity constraint is a challenging issue. To deal with such a challenge, in this paper, we design a multi-agent deep deterministic policy gradient (MADDPG) based cost-aware collaborative task-execution (CACTE) scheme in energy harvesting D2D (EH-D2D) networks. To validate the CACTE scheme's performance, we conducted extensive experiments to compare the CACTE scheme with four baseline algorithms, including Local, Random, ECLB (Energy Capacity Load Balance) and CCLB (Computing Capacity Load Balance). Experiments were accompanied by various system parameters, such as the mobile device's battery capacity, task workload, the bandwidth and so on. The experimental results show that the CACTE scheme can make multiple mobile devices cooperate effectively with one another to execute many more tasks and achieve a higher long-term reward, including lower task latency and fewer dropped tasks.

Original languageEnglish
Article number108176
JournalComputer Networks
Early online date29 May 2021
Publication statusPublished - 4 Aug 2021

Bibliographical note

© 2021, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Funding Information:
This work was supported by the National Science Foundation of China (No. 61802095, 61572162, 61572251), the Zhejiang Provincial National Science Foundation of China (No. LQ19F020011, LQ17F020003), the Zhejiang Provincial Key Science and Technology Project Foundation (NO. 2018C01012), and the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications) (No. SKLNST-2019-2-15) and VC Research (VCR 0000111).


  • collaborative task execution
  • cost-aware
  • D2D networks
  • multi-agent deep deterministic policy gradient
  • partially observable Markov decision process


Dive into the research topics of 'Multi-agent reinforcement learning for cost-aware collaborative task execution in energy-harvesting D2D networks'. Together they form a unique fingerprint.

Cite this