Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

Yiyong Xiao, Changhao Huang, Jiaoying Huang*, Ikou Kaku, Yuchun Xu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results.

Original languageEnglish
Pages (from-to)183-195
Number of pages13
JournalPattern Recognition
Volume90
Early online date29 Jan 2019
DOIs
Publication statusPublished - Jun 2019

Bibliographical note

© 2019, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/

Funding: This work is partially supported by the National Natural Science Foundation of China under grant nos. 71871003 and 61376042.

Keywords

  • Categorical clustering
  • Data mining
  • Integer linear programming
  • Variable neighborhood search

Fingerprint

Dive into the research topics of 'Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering'. Together they form a unique fingerprint.

Cite this