TY - JOUR
T1 - Distributed decentralized collaborative monitoring architecture for cloud infrastructures
AU - Xu, Xiaolong
AU - Chen, Yun
AU - Alcaraz Calero, Jose M.
PY - 2017/9
Y1 - 2017/9
N2 - Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.
AB - Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.
KW - Cloud computing
KW - Resource monitoring
KW - Collaborative monitoring
KW - Adaptive threshold control
UR - https://link.springer.com/article/10.1007/s10586-016-0675-5
U2 - 10.1007/s10586-016-0675-5
DO - 10.1007/s10586-016-0675-5
M3 - Article
SN - 1573-7543
VL - 20
SP - 2451
EP - 2463
JO - Cluster Computing
JF - Cluster Computing
ER -