Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis

Md. Shamim Reza; Sabba Ruhi

doi:doi:10.11648/j.ajtas.20150405.11

| Peer-Reviewed

Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis

Md. Shamim Reza, Sabba Ruhi

Published in American Journal of Theoretical and Applied Statistics (Volume 4, Issue 5)

Received: 5 July 2015 Accepted: 17 July 2015 Published: 28 July 2015

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.

Published in	American Journal of Theoretical and Applied Statistics (Volume 4, Issue 5)
DOI	10.11648/j.ajtas.20150405.11
Page(s)	317-321
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Clustering, K-means, PCA, ICA

References

[1]	Bradley, P., & Fayyad, U. (1998). Refining initial points for k means clustering. Proc. 15th International Conf. on Machine Learning.
[2]	Cluster R package.(http://cran.r-project.org/web/packages/ cluster/index.html).
[3]	Ding, C., & He, X.. K-Means clustering via principal component analysis. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
[4]	Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification, 2nd ed. Wiley.
[5]	Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1,183–187.
[6]	Groeneveld RA (1998) A class of quantile measures for kurtosis. Am Stat 52: 325-329.
[7]	Hartigan, J., & Wang, M. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
[8]	Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Springer Verlag.
[9]	Hyv¨arinen, A. and Oja, E.: Independent component analysis: Algorithms and applications. Neural Networks. 4-5(13):411-430. 2000.
[10]	Jain, A., & Dubes, R. (1988). Algorithms for clustering data. Prentice Hall.
[11]	J.C. Salagubang and Erniel B. Barrios, Outlier detection in high dimensional data in the context of clustering, 12th National Convention on Statistics (NCS) EDSA Shangri-La Hotel, Mandaluyong City October 1-2, 2013
[12]	Johnson, R. and Wischern, D. (2002). Applied Multivariate statistical analysis, 5th ed. Prentice-Hall, Inc.
[13]	Jolliffe, I. (2002). Principal component analysis. Springer. 2nd edition.
[14]	Jones,M. and Sibson, R. What is projection pursuit? J. of the Royal Statistical Society, Ser. A, 150:1-36. 1987.
[15]	Kotz, S., and Seier, E. (2008), Kurtosis of the Two-Sided Power Distribution, Brazilian Journal of Probability and Statistics, 28, 6168.
[16]	Leela, V. K. Sakthi priya and R. Manikandan, 2013. “Comparative Study of Clustering Techniques in Iris Data Sets” World Applied Sciences Journal 29 (Data Mining and Soft Computing Techniques): 24-29, 2014 ISSN 1818-4952.
[17]	Lihua An, S.Ejaz Ahmed. Improving the performance of kurtosis estimator. Computational Statistics and Data Analysis 52, 2669-2681. 2008.
[18]	Lloyd, S. (1957). Least squares quantization in pcm. Bell Telephone Laboratories Paper, Marray Hill.
[19]	MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symposium, 281–297.
[20]	Maurya V.N., Misra R.B., Jaggi C.K., and Maurya A.K., Performance analysis of powers of skewness and kurtosis based multivariate normality tests and use of extended Monte Carlo simulation for proposed novelty algorithm, American Journal of Theoretical and Applied Statistics, Science Publishing Group, USA, Vol. 4(2-1), pp. 11-18, 2015.
[21]	Matthias Scholz, Yves Gibon, Mark Stitt and Joachim Selbig, Independent component analysis of starch deficient pgm mutants. Proceedings of the German conference on Bioinformatics. Gesellschaft fur info mark, Bonn, pp.95-104,2004.
[22]	Meira Jr., W.; Zaki, M. Fundamentals of Data Mining Algorithms.(http://www.dcc.ufmg.br/miningalgorithms/DokuWiki/doku.php).
[23]	Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001).
[24]	Pearson K (1905) Skew variation, a rejoinder. Biometrika 4:169212.
[25]	Reza, M.S., Nasser, M. and Shahjaman, M. (2011) An Improved Version of Kurtosis Measure and Their Application in ICA, International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1.
[26]	Reza M.S., Ruhi S., Multivariate Outlier Detection Using Independent Component Analysis, Science Journal of Applied Mathematics and Statistics, Science Publishing Group, USA, Vol. 3, No. 4, 2015, pp. 171-176. doi: 10.11648/j.sjams.20150304.11.
[27]	Scholz, M., Gatzek, S., Sterling, A., Fiehn, O., and Selbig, J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 20, 2447-2454, 2004.
[28]	Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2002). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS’01), 1057–1064.

Cite This Article

Plain Text BibTeX RIS

APA Style

Md. Shamim Reza, Sabba Ruhi. (2015). Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. American Journal of Theoretical and Applied Statistics, 4(5), 317-321. https://doi.org/10.11648/j.ajtas.20150405.11

Copy | Download

ACS Style

Md. Shamim Reza; Sabba Ruhi. Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. Am. J. Theor. Appl. Stat. 2015, 4(5), 317-321. doi: 10.11648/j.ajtas.20150405.11

Copy | Download

AMA Style

Md. Shamim Reza, Sabba Ruhi. Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. Am J Theor Appl Stat. 2015;4(5):317-321. doi: 10.11648/j.ajtas.20150405.11

Copy | Download

@article{10.11648/j.ajtas.20150405.11,
  author = {Md. Shamim Reza and Sabba Ruhi},
  title = {Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {4},
  number = {5},
  pages = {317-321},
  doi = {10.11648/j.ajtas.20150405.11},
  url = {https://doi.org/10.11648/j.ajtas.20150405.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20150405.11},
  abstract = {For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.},
 year = {2015}
}

Copy | Download

TY  - JOUR
T1  - Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis
AU  - Md. Shamim Reza
AU  - Sabba Ruhi
Y1  - 2015/07/28
PY  - 2015
N1  - https://doi.org/10.11648/j.ajtas.20150405.11
DO  - 10.11648/j.ajtas.20150405.11
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 317
EP  - 321
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20150405.11
AB  - For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.
VL  - 4
IS  - 5
ER  -

Copy | Download

Author Information

Md. Shamim Reza

Department of Mathematics, Pabna University of Science & Technology, Pabna, Bangladesh
Sabba Ruhi

Department of Mathematics, Pabna University of Science & Technology, Pabna, Bangladesh

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Md. Shamim Reza, Sabba Ruhi. (2015). Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. American Journal of Theoretical and Applied Statistics, 4(5), 317-321. https://doi.org/10.11648/j.ajtas.20150405.11

Copy | Download

ACS Style

Md. Shamim Reza; Sabba Ruhi. Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. Am. J. Theor. Appl. Stat. 2015, 4(5), 317-321. doi: 10.11648/j.ajtas.20150405.11

Copy | Download

AMA Style

Md. Shamim Reza, Sabba Ruhi. Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. Am J Theor Appl Stat. 2015;4(5):317-321. doi: 10.11648/j.ajtas.20150405.11

Copy | Download

@article{10.11648/j.ajtas.20150405.11,
  author = {Md. Shamim Reza and Sabba Ruhi},
  title = {Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {4},
  number = {5},
  pages = {317-321},
  doi = {10.11648/j.ajtas.20150405.11},
  url = {https://doi.org/10.11648/j.ajtas.20150405.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20150405.11},
  abstract = {For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.},
 year = {2015}
}

Copy | Download

TY  - JOUR
T1  - Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis
AU  - Md. Shamim Reza
AU  - Sabba Ruhi
Y1  - 2015/07/28
PY  - 2015
N1  - https://doi.org/10.11648/j.ajtas.20150405.11
DO  - 10.11648/j.ajtas.20150405.11
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 317
EP  - 321
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20150405.11
AB  - For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.
VL  - 4
IS  - 5
ER  -

Copy | Download