TY - GEN
T1 - An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering
AU - Zhang, Baojie
AU - Cao, Yang
AU - Zhu, Ye
AU - Rajasegarar, Sutharshan
AU - Liu, Gang
AU - Li, Hong Xian
AU - Angelova, Maia
AU - Li, Gang
PY - 2023/5/27
Y1 - 2023/5/27
N2 - The advances of 5G and the Internet of Things enable more devices and sensors to be interconnected. Unlike traditional data, the large amount of data generated from various sensors and devices requires real-time analysis. The data objects in a stream will change over time and only have a single access. Thus, traditional methods no longer meet the needs of fast exploratory data analysis for continuously generated data. Cluster tendency assessment is an effective method to determine the number of potential clusters. Recently, there are methods based on Visual Assessment of cluster Tendency (VAT) proposed for visualising cluster structures in streaming data using cluster heat maps. However, those heat maps rely on Euclidean distance that does not consider the data distribution characteristics. Consequently, it would be difficult to separate adjacent clusters of varied densities. In this paper, we discuss this issue for the latest inc-siVAT method, and propose to use a data-dependent kernel method to overcome it for clustering streaming data. Extensive evaluation on 7 large synthetic and real-world datasets shows the superiority of kernel-based inc-siVAT over 4 recently published state-of-the-art online and offline clustering algorithms.
AB - The advances of 5G and the Internet of Things enable more devices and sensors to be interconnected. Unlike traditional data, the large amount of data generated from various sensors and devices requires real-time analysis. The data objects in a stream will change over time and only have a single access. Thus, traditional methods no longer meet the needs of fast exploratory data analysis for continuously generated data. Cluster tendency assessment is an effective method to determine the number of potential clusters. Recently, there are methods based on Visual Assessment of cluster Tendency (VAT) proposed for visualising cluster structures in streaming data using cluster heat maps. However, those heat maps rely on Euclidean distance that does not consider the data distribution characteristics. Consequently, it would be difficult to separate adjacent clusters of varied densities. In this paper, we discuss this issue for the latest inc-siVAT method, and propose to use a data-dependent kernel method to overcome it for clustering streaming data. Extensive evaluation on 7 large synthetic and real-world datasets shows the superiority of kernel-based inc-siVAT over 4 recently published state-of-the-art online and offline clustering algorithms.
KW - Cluster tendency assessment
KW - Clustering
KW - Data stream
KW - Isolation kernel
KW - VAT
UR - https://link.springer.com/chapter/10.1007/978-3-031-33374-3_16
UR - http://www.scopus.com/inward/record.url?scp=85173569917&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-33374-3_16
DO - 10.1007/978-3-031-33374-3_16
M3 - Conference publication
AN - SCOPUS:85173569917
SN - 9783031333736
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 197
EP - 209
BT - Advances in Knowledge Discovery and Data Mining - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Proceedings
A2 - Kashima, Hisashi
A2 - Ide, Tsuyoshi
A2 - Peng, Wen-Chih
PB - Springer
T2 - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023
Y2 - 25 May 2023 through 28 May 2023
ER -