TY - JOUR
T1 - Density-based clustering using approximate natural neighbours
AU - Angelova, Maia
AU - Beliakov, Gleb
AU - Zhu, Ye
N1 - Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/12/4
Y1 - 2019/12/4
N2 - We propose a computationally efficient natural neighbour based metric for discovering clusters of arbitrary shape based on fuzzy measures. The approximate natural neighbours are found with the help of the Choquet integral with respect to a specially designed two-additive fuzzy measure. Fuzzy betweenness relation is used to construct such a measure and helps determine the natural neighbours of a query point. The natural neighbours of a datum allow the computation of point density estimate, which in turn defines a density-based metric suitable for clustering. The proposed method overcomes the exponential computational complexity of the Delaunay triangulation traditionally used to identify the natural neighbours. The run-time of the density estimate by this metric keeps the same quadratic trends as the Euclidean distance based estimates with respect to the data size. Empirical evaluation based on 20 synthetic and real-world datasets shows that this metric has a higher clustering accuracy for existing state-of-the-art density-based clustering algorithms, such as DBSCAN, SNN and DP. Furthermore, the proposed metric is easily combined with these algorithms, and the enhanced clustering algorithms inherit positive features such as resistance to noise and imbalanced data.
AB - We propose a computationally efficient natural neighbour based metric for discovering clusters of arbitrary shape based on fuzzy measures. The approximate natural neighbours are found with the help of the Choquet integral with respect to a specially designed two-additive fuzzy measure. Fuzzy betweenness relation is used to construct such a measure and helps determine the natural neighbours of a query point. The natural neighbours of a datum allow the computation of point density estimate, which in turn defines a density-based metric suitable for clustering. The proposed method overcomes the exponential computational complexity of the Delaunay triangulation traditionally used to identify the natural neighbours. The run-time of the density estimate by this metric keeps the same quadratic trends as the Euclidean distance based estimates with respect to the data size. Empirical evaluation based on 20 synthetic and real-world datasets shows that this metric has a higher clustering accuracy for existing state-of-the-art density-based clustering algorithms, such as DBSCAN, SNN and DP. Furthermore, the proposed metric is easily combined with these algorithms, and the enhanced clustering algorithms inherit positive features such as resistance to noise and imbalanced data.
KW - Choquet integral
KW - Clustering
KW - Density estimate
KW - Fuzzy betweenness relation
KW - Fuzzy measure
UR - http://www.scopus.com/inward/record.url?scp=85079895038&partnerID=8YFLogxK
UR - https://www.sciencedirect.com/science/article/pii/S1568494619306489?via%3Dihub
U2 - 10.1016/j.asoc.2019.105867
DO - 10.1016/j.asoc.2019.105867
M3 - Article
AN - SCOPUS:85079895038
SN - 1568-4946
VL - 85
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 105867
ER -