Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis
American Journal of Theoretical and Applied Statistics
Volume 4, Issue 5, September 2015, Pages: 317-321
Received: Jul. 5, 2015; Accepted: Jul. 17, 2015; Published: Jul. 28, 2015
Views 5729      Downloads 208
Authors
Md. Shamim Reza, Department of Mathematics, Pabna University of Science & Technology, Pabna, Bangladesh
Sabba Ruhi, Department of Mathematics, Pabna University of Science & Technology, Pabna, Bangladesh
Article Tools
Follow on us
Abstract
For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.
Keywords
Clustering, K-means, PCA, ICA
To cite this article
Md. Shamim Reza, Sabba Ruhi, Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis, American Journal of Theoretical and Applied Statistics. Vol. 4, No. 5, 2015, pp. 317-321. doi: 10.11648/j.ajtas.20150405.11
References
[1]
Bradley, P., & Fayyad, U. (1998). Refining initial points for k means clustering. Proc. 15th International Conf. on Machine Learning.
[2]
Cluster R package.(http://cran.r-project.org/web/packages/ cluster/index.html).
[3]
Ding, C., & He, X.. K-Means clustering via principal component analysis. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
[4]
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification, 2nd ed. Wiley.
[5]
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1,183–187.
[6]
Groeneveld RA (1998) A class of quantile measures for kurtosis. Am Stat 52: 325-329.
[7]
Hartigan, J., & Wang, M. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
[8]
Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Springer Verlag.
[9]
Hyv¨arinen, A. and Oja, E.: Independent component analysis: Algorithms and applications. Neural Networks. 4-5(13):411-430. 2000.
[10]
Jain, A., & Dubes, R. (1988). Algorithms for clustering data. Prentice Hall.
[11]
J.C. Salagubang and Erniel B. Barrios, Outlier detection in high dimensional data in the context of clustering, 12th National Convention on Statistics (NCS) EDSA Shangri-La Hotel, Mandaluyong City October 1-2, 2013
[12]
Johnson, R. and Wischern, D. (2002). Applied Multivariate statistical analysis, 5th ed. Prentice-Hall, Inc.
[13]
Jolliffe, I. (2002). Principal component analysis. Springer. 2nd edition.
[14]
Jones,M. and Sibson, R. What is projection pursuit? J. of the Royal Statistical Society, Ser. A, 150:1-36. 1987.
[15]
Kotz, S., and Seier, E. (2008), Kurtosis of the Two-Sided Power Distribution, Brazilian Journal of Probability and Statistics, 28, 6168.
[16]
Leela, V. K. Sakthi priya and R. Manikandan, 2013. “Comparative Study of Clustering Techniques in Iris Data Sets” World Applied Sciences Journal 29 (Data Mining and Soft Computing Techniques): 24-29, 2014 ISSN 1818-4952.
[17]
Lihua An, S.Ejaz Ahmed. Improving the performance of kurtosis estimator. Computational Statistics and Data Analysis 52, 2669-2681. 2008.
[18]
Lloyd, S. (1957). Least squares quantization in pcm. Bell Telephone Laboratories Paper, Marray Hill.
[19]
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symposium, 281–297.
[20]
Maurya V.N., Misra R.B., Jaggi C.K., and Maurya A.K., Performance analysis of powers of skewness and kurtosis based multivariate normality tests and use of extended Monte Carlo simulation for proposed novelty algorithm, American Journal of Theoretical and Applied Statistics, Science Publishing Group, USA, Vol. 4(2-1), pp. 11-18, 2015.
[21]
Matthias Scholz, Yves Gibon, Mark Stitt and Joachim Selbig, Independent component analysis of starch deficient pgm mutants. Proceedings of the German conference on Bioinformatics. Gesellschaft fur info mark, Bonn, pp.95-104,2004.
[22]
Meira Jr., W.; Zaki, M. Fundamentals of Data Mining Algorithms.(http://www.dcc.ufmg.br/miningalgorithms/DokuWiki/doku.php).
[23]
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001).
[24]
Pearson K (1905) Skew variation, a rejoinder. Biometrika 4:169212.
[25]
Reza, M.S., Nasser, M. and Shahjaman, M. (2011) An Improved Version of Kurtosis Measure and Their Application in ICA, International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1.
[26]
Reza M.S., Ruhi S., Multivariate Outlier Detection Using Independent Component Analysis, Science Journal of Applied Mathematics and Statistics, Science Publishing Group, USA, Vol. 3, No. 4, 2015, pp. 171-176. doi: 10.11648/j.sjams.20150304.11.
[27]
Scholz, M., Gatzek, S., Sterling, A., Fiehn, O., and Selbig, J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 20, 2447-2454, 2004.
[28]
Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2002). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS’01), 1057–1064.
ADDRESS
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
U.S.A.
Tel: (001)347-983-5186