Event-driven temporal models for explanations-ETeMoX: explaining reinforcement learning

Juan Parra; Antonio Garcia-Dominguez; Nelly Bencomo; Changgang Zheng; Chen Zhen; Juan Boubeta-Puig; Guadalupe Ortiz; Shufan  Yang

doi:10.1007/s10270-021-00952-4

Event-driven temporal models for explanations-ETeMoX: explaining reinforcement learning

Juan Parra^*, Antonio Garcia-Dominguez, Nelly Bencomo, Changgang Zheng, Chen Zhen, Juan Boubeta-Puig, Guadalupe Ortiz, Shufan Yang

^*Corresponding author for this work

College of Engineering and Physical Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Modern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.

Original language	English
Pages (from-to)	1091–1113
Number of pages	23
Journal	Software and Systems Modeling
Volume	21
Issue number	3
Early online date	18 Dec 2021
DOIs	https://doi.org/10.1007/s10270-021-00952-4
Publication status	Published - Jun 2022

Bibliographical note

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Funding: This work has been partially sponsored by The Leverhulme Trust Fellowship “QuantUn: quantification of uncertainty using Bayesian surprises” (Grant No. RF-2019-548/9), the EPSRC Research Project Twenty20Insight (Grant No. EP/T017627/1), The Royal Society of Edinburgh project “A Reinforcement Learning Based Resource Management System for Long Term Care for Elderly People” (Grant No. 961_Yang), the Spanish Ministry of Science and Innovation and the European Regional Development Funds under project FAME (Grant No. RTI2018-093608-B-C33], and the Research Plan from the University of Cadiz and Grupo Ener-gético de Puerto Real S.A. under project GANGES (Grant No. IRTP03_UCA).

Keywords

Artificial intelligence
Complex event processing
Event-driven monitoring
Explainable reinforcement learning
Temporal models

Access to Document

10.1007/s10270-021-00952-4Licence: CC BY 4.0

Event-driven temporal models for explanations - ETeMoX
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Final published version, 2.6 MBLicence: CC BY 4.0

Cite this

@article{b6473ea647974f6ba7775d24a167b858,

title = "Event-driven temporal models for explanations-ETeMoX: explaining reinforcement learning",

abstract = "Modern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system{\textquoteright}s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.",

keywords = "Artificial intelligence, Complex event processing, Event-driven monitoring, Explainable reinforcement learning, Temporal models",

author = "Juan Parra and Antonio Garcia-Dominguez and Nelly Bencomo and Changgang Zheng and Chen Zhen and Juan Boubeta-Puig and Guadalupe Ortiz and Shufan Yang",

note = "This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article{\textquoteright}s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article{\textquoteright}s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Funding: This work has been partially sponsored by The Leverhulme Trust Fellowship “QuantUn: quantification of uncertainty using Bayesian surprises” (Grant No. RF-2019-548/9), the EPSRC Research Project Twenty20Insight (Grant No. EP/T017627/1), The Royal Society of Edinburgh project “A Reinforcement Learning Based Resource Management System for Long Term Care for Elderly People” (Grant No. 961_Yang), the Spanish Ministry of Science and Innovation and the European Regional Development Funds under project FAME (Grant No. RTI2018-093608-B-C33], and the Research Plan from the University of Cadiz and Grupo Ener-g{\'e}tico de Puerto Real S.A. under project GANGES (Grant No. IRTP03_UCA).",

year = "2022",

month = jun,

doi = "10.1007/s10270-021-00952-4",

language = "English",

volume = "21",

pages = "1091–1113",

journal = "Software and Systems Modeling",

issn = "1619-1366",

publisher = "Springer",

number = "3",

}

TY - JOUR

T1 - Event-driven temporal models for explanations-ETeMoX: explaining reinforcement learning

AU - Parra, Juan

AU - Garcia-Dominguez, Antonio

AU - Bencomo, Nelly

AU - Zheng, Changgang

AU - Zhen, Chen

AU - Boubeta-Puig, Juan

AU - Ortiz, Guadalupe

AU - Yang, Shufan

N1 - This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Funding: This work has been partially sponsored by The Leverhulme Trust Fellowship “QuantUn: quantification of uncertainty using Bayesian surprises” (Grant No. RF-2019-548/9), the EPSRC Research Project Twenty20Insight (Grant No. EP/T017627/1), The Royal Society of Edinburgh project “A Reinforcement Learning Based Resource Management System for Long Term Care for Elderly People” (Grant No. 961_Yang), the Spanish Ministry of Science and Innovation and the European Regional Development Funds under project FAME (Grant No. RTI2018-093608-B-C33], and the Research Plan from the University of Cadiz and Grupo Ener-gético de Puerto Real S.A. under project GANGES (Grant No. IRTP03_UCA).

PY - 2022/6

Y1 - 2022/6

N2 - Modern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.

AB - Modern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.

KW - Artificial intelligence

KW - Complex event processing

KW - Event-driven monitoring

KW - Explainable reinforcement learning

KW - Temporal models

UR - https://link.springer.com/article/10.1007%2Fs10270-021-00952-4

UR - http://www.scopus.com/inward/record.url?scp=85121417819&partnerID=8YFLogxK

U2 - 10.1007/s10270-021-00952-4

DO - 10.1007/s10270-021-00952-4

M3 - Article

SN - 1619-1366

VL - 21

SP - 1091

EP - 1113

JO - Software and Systems Modeling

JF - Software and Systems Modeling

IS - 3

ER -

Event-driven temporal models for explanations-ETeMoX: explaining reinforcement learning

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this