OPTIMIZATION OF K-MODE ALGORITHM FOR DATA MINING USING PARTICLE SWARM OPTIMIZATION
Abstract
K-mode is a popular data mining algorithm because of its effective performance in handling categorical data. It has a problem in its methodology in the area of choosing the initial cluster centers for its clustering tasks which usually affects its results. The research proposed a novel PSO K-mode algorithm called PSOKM to improve the performance of K-mode clustering algorithm using PSO. Fitness function was defined based on the structure of K-mode algorithm and weights; the cluster centroids were optimized using PSO. The initial cost for the PSO was taken from K-mode; the weights were picked at random and two centroids from each class were randomly picked. The research used University of California Irvine (UCI) data set and crime data to evaluate the performances of the PSOKM algorithms against conventional K-mode algorithms using metrics such as accuracy, time, sensitivity, specificity and ROC curve. Evaluation result reveals that the PSOKM improved the accuracy of K mode algorithm from 76% to 89.4% using the crime data. The reliability of the algorithms performance was also conducted using UCI data set and the results obtained were compared with the ones from other variant algorithms. The result revealed that the performance of PSOKM were better than that of the respective variants in most cases.
References
Alams, S., Dobbie, G., Koh, Y. S., Riddle, P., & Ur Rehman, S. (2014). Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm and Evolution Computation, 17, 1-13.
Behera, H. S., Abhishek, G., & Sipak, K. M. (2012). A new Improved Hybridized K-means Clustering Algorithm with Improved PCA Optimized with PSO for High Dimensional Dataset. International Journal of Soft Computing and Engineering, 2(2), 2231-2307.
Chuang, L. Y., Lin, Y. D., & Yang, C. H. (2012). An improved particle swarm optimization for data clustering. In Proceedings of the International Multi Conference of
Engineers & Computer Scientist, 1(1).
Ghorpade-Aher, J., & Metre, V. A. (2014). Clustering Multidimensional Data with PSO based Algorithm. arXiv preprint arXiv:1402.6428.
Huang, Z. (1997). A fast clustering algorithm to cluster very large categorical data setsin data mining. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 1-8.
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets withcategorical values. Data Mining and Knowledge Discovery, 2(3): 283-304.
Huang, Z., & Ng, M. K. (2003). A note on k-modes clustering. Journal of Classification, 20(2), 257-261.
Jiawei, H., Micheline, K., & Jian, P. (2012). Data mining: Concept and Techniques (3rded).San Francisco, Elsevier.
Li-Ye, C., Yu,-D. L., & Cheng-Hong, Y. (2012). An improved PSO for data clustering. Proceedings of the International Conference of Engineers and Computer Scientist, 1, 26-40.
Martens, D., Baesens, B., & Fawcett, T. (2011). Editorial survey: swarm intelligence for data mining. Machine Learning, 82(1), 1-42.
Satyobroto, T. (2011). Mathematical modelling and applications of particle swarm optimization. Doctoral Dissertation, Institute of Technology, Blekinge.
Zhang, Y., Fu, A. W., Cai, C. H., Heng, P. A. (2000). Clustering categorical data. In: Proc of ICDE’00, pp. 305-305, 2000
Zhao, X., & Mei, L. (2013). 3D object retrieval based on PSOK- mode. Journal of Software, 8(4), 963-970.
Copyright (c) 2023 FUDMA JOURNAL OF SCIENCES
This work is licensed under a Creative Commons Attribution 4.0 International License.
FUDMA Journal of Sciences