An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering

Baojie Zhang, Yang Cao*, Ye Zhu, Sutharshan Rajasegarar, Gang Liu, Hong Xian Li, Maia Angelova, Gang Li

*Corresponding author for this work

Research output: Chapter in Book/Published conference outputConference publication

Abstract

The advances of 5G and the Internet of Things enable more devices and sensors to be interconnected. Unlike traditional data, the large amount of data generated from various sensors and devices requires real-time analysis. The data objects in a stream will change over time and only have a single access. Thus, traditional methods no longer meet the needs of fast exploratory data analysis for continuously generated data. Cluster tendency assessment is an effective method to determine the number of potential clusters. Recently, there are methods based on Visual Assessment of cluster Tendency (VAT) proposed for visualising cluster structures in streaming data using cluster heat maps. However, those heat maps rely on Euclidean distance that does not consider the data distribution characteristics. Consequently, it would be difficult to separate adjacent clusters of varied densities. In this paper, we discuss this issue for the latest inc-siVAT method, and propose to use a data-dependent kernel method to overcome it for clustering streaming data. Extensive evaluation on 7 large synthetic and real-world datasets shows the superiority of kernel-based inc-siVAT over 4 recently published state-of-the-art online and offline clustering algorithms.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023, Proceedings
EditorsHisashi Kashima, Tsuyoshi Ide, Wen-Chih Peng
PublisherSpringer
Pages197-209
Number of pages13
ISBN (Print)9783031333736
DOIs
Publication statusPublished - 27 May 2023
Event27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023 - Osaka, Japan
Duration: 25 May 202328 May 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13935 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2023
Country/TerritoryJapan
CityOsaka
Period25/05/2328/05/23

Keywords

  • Cluster tendency assessment
  • Clustering
  • Data stream
  • Isolation kernel
  • VAT

Fingerprint

Dive into the research topics of 'An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering'. Together they form a unique fingerprint.

Cite this