Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

Yingyi Kuang; Abraham Itzhak Weinberg; George Vogiatzis; Diego R. Faria

doi:10.1109/RO-MAN47096.2020.9223473

Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

Yingyi Kuang, Abraham Itzhak Weinberg, George Vogiatzis, Diego R. Faria

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.

Original language	English
Title of host publication	29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020
Publisher	IEEE
Pages	432-437
Number of pages	6
ISBN (Electronic)	9781728160757
ISBN (Print)	978-1-7281-6076-4
DOIs	https://doi.org/10.1109/RO-MAN47096.2020.9223473
Publication status	Published - 14 Oct 2020
Event	29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020 - Virtual, Naples, Italy Duration: 31 Aug 2020 → 4 Sept 2020

Publication series

Name	29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020
Publisher	IEEE
ISSN (Print)	1944-9445
ISSN (Electronic)	1944-9437

Conference

Conference	29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020
Country/Territory	Italy
City	Virtual, Naples
Period	31/08/20 → 4/09/20

Bibliographical note

Funding Information:
ACKNOWLEDGMENT This work is supported by EPSRC-UK InDex project (EU CHIST-ERA programme), with reference EP/S032355/1.

Publisher Copyright:
© 2020 IEEE.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Access to Document

10.1109/RO-MAN47096.2020.9223473

Cite this

Kuang, Y., Weinberg, A. I., Vogiatzis, G., & Faria, D. R. (2020). Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning. In 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020 (pp. 432-437). Article 9223473 (29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020). IEEE. https://doi.org/10.1109/RO-MAN47096.2020.9223473

Kuang, Yingyi ; Weinberg, Abraham Itzhak ; Vogiatzis, George et al. / Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning. 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020. IEEE, 2020. pp. 432-437 (29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020).

@inproceedings{69e524ec2af94ea6aec36f21be90fb1d,

title = "Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning",

abstract = "Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.",

author = "Yingyi Kuang and Weinberg, {Abraham Itzhak} and George Vogiatzis and Faria, {Diego R.}",

note = "Funding Information: ACKNOWLEDGMENT This work is supported by EPSRC-UK InDex project (EU CHIST-ERA programme), with reference EP/S032355/1. Publisher Copyright: {\textcopyright} 2020 IEEE. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020 ; Conference date: 31-08-2020 Through 04-09-2020",

year = "2020",

month = oct,

day = "14",

doi = "10.1109/RO-MAN47096.2020.9223473",

language = "English",

isbn = "978-1-7281-6076-4",

series = "29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020",

publisher = "IEEE",

pages = "432--437",

booktitle = "29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020",

address = "United States",

}

Kuang, Y, Weinberg, AI, Vogiatzis, G & Faria, DR 2020, Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning. in 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020., 9223473, 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020, IEEE, pp. 432-437, 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020, Virtual, Naples, Italy, 31/08/20. https://doi.org/10.1109/RO-MAN47096.2020.9223473

Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning. / Kuang, Yingyi; Weinberg, Abraham Itzhak; Vogiatzis, George et al.
29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020. IEEE, 2020. p. 432-437 9223473 (29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020).

Research output: Chapter in Book/Published conference output › Conference publication

TY - GEN

T1 - Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

AU - Kuang, Yingyi

AU - Weinberg, Abraham Itzhak

AU - Vogiatzis, George

AU - Faria, Diego R.

N1 - Funding Information: ACKNOWLEDGMENT This work is supported by EPSRC-UK InDex project (EU CHIST-ERA programme), with reference EP/S032355/1. Publisher Copyright: © 2020 IEEE. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2020/10/14

Y1 - 2020/10/14

N2 - Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.

AB - Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.

UR - http://www.scopus.com/inward/record.url?scp=85095793124&partnerID=8YFLogxK

UR - https://ieeexplore.ieee.org/document/9223473

U2 - 10.1109/RO-MAN47096.2020.9223473

DO - 10.1109/RO-MAN47096.2020.9223473

M3 - Conference publication

AN - SCOPUS:85095793124

SN - 978-1-7281-6076-4

T3 - 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020

SP - 432

EP - 437

BT - 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020

PB - IEEE

T2 - 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020

Y2 - 31 August 2020 through 4 September 2020

ER -

Kuang Y, Weinberg AI, Vogiatzis G, Faria DR. Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning. In 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020. IEEE. 2020. p. 432-437. 9223473. (29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020). doi: 10.1109/RO-MAN47096.2020.9223473

Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

Abstract

Publication series

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this