Biomedical Statistics and Informatics

| Peer-Reviewed |

Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data

Received: 29 July 2017    Accepted: 30 August 2017    Published: 20 October 2017
Views:       Downloads:

Share This Article

Abstract

Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.

DOI 10.11648/j.bsi.20170204.16
Published in Biomedical Statistics and Informatics (Volume 2, Issue 4, December 2017)
Page(s) 166-171
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Microarray Gene Expression Data, T-Test, Renewed Approach, Wilcoxon Signed Rank Test, Differentially Expressed Genes, Outlier

References
[1] Nguyen TV, Andresen BS, Corydon TJ, Ghisla S, Abd-El Razik N, Mohsen AW, Cederbaum SD, Roe DS, Roe CR, Lench NJ, Vockley J (2002); Identification of isobutyryl-CoA dehydrogenase and its deficiency in humans. Mol Genet Metab, vol. 77, pp. 68-79.
[2] Chu G, Narasimhan B, Tibshirani R, Tusher V (2002); "SAM "Significance Analysis of Microarrays" Users Guide and technical document."
[3] Monti S, Tamayo P, Mesirov J, Golub T. (2003); Consensus clustering: a re-sampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn, vol. 52, pp. 91-118.
[4] Devore J. And Peck R (1997); “Satistics: The exploration and analysis of data”, 3rd edition, Duxury Press, Pacific Grove, CA.
[5] Thomas JG, Olson JM, Tapscott SJ, Zhao (2001); An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, vol.11, No. 7, pp. 1227-1236.
[6] Pan W (2001); A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics, vol. 18, pp. 546-554.
[7] Efron B, Tibshirani R, Gross V, Tusher VG (2001); Empirical Bayes analysis of a microarray experiment. Journal of American Statistic Association, vol. 96, pp. 1151-1160.
[8] Tusher VG, Tibshirani R, and Chu G (2001); “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98, pp. 5116-5121.
[9] Jung K., Quast K., Gannoun A. and Urfer W. (2006); A renewed approach to the nonparametric analysis of replicated microarray experiments. Biometrical Journal, vol. 48, pp. 245-254.
[10] Quackenbush J (2001); Computational analysis of cDNA microarray data. Nature Reviews, vol. 6, No. 2, pp. 418-428.
[11] Chun-Ming Jiang, Xiao-Hua Wang, Jin Shu, Wei-Xia Yang, Ping Fu, Li-Li Zhuang, Guo-Ping Zhou (2015); Analysis of differentially expressed genes based on microarray data of glioma. Int J Clin Exp Med, vol. 8, pp. 17321–17332.
[12] Jennifer SM, Ariana KL, Charles JR, Qing-Xiang AS (2015); Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer. PLoS One, vol. 10, No. 12, e0145322. https://doi.org/10.1371/journal.pone.0145322.
[13] Hossen Md. B. and Siraj-Ud-Doulah (2016); Identification of Robust Clustering Methods in Gene Expression Data Analysis. Current Bioinformatics, vol. 11, No. 3, pp. 01-05.
[14] Best DJ and Rayner JC (1987); Multiple Comparisons, Selection and Applications in Biometry. Vol. 30, pp. 719-724.
[15] Dudoit S, Shaffer CBJ (2003); Multiple hypothesis testing in microarray experiments. Statistical Science. vol. 18, No. 1, pp. 71–103.
[16] Alka B, Monir HS, Hassan AK (2015); Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II-4/W2, pp. 67-71.
[17] Jolliffe (2001); Principal Component Analysis, 2nd edition, Springer Series in Statistics.
[18] Snedecor, G. W., Cochran, W. G. (1980). Statistical Methods (seventh edition). Iowa State University, Press, Ames, Iowa.
[19] Corder, G. W., Foreman, D. I. (2009). Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach Wiley, ISBN 978-0-470-45461-9.
[20] Meiller A, Alvarez S, Drané P, Lallemand C, Blanchard B, et al. (2007); p53-dependent stimulation of redox-related genes in the lymphoid organs of gamma-irradiated mice: identification of haeme-oxygenase 1 as a direct p53 target gene. Nucleic Acids Res, vol. 20, pp. 6924–6934.
[21] Zhao LP, Prentice R and Breeden L (2001); Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc National Acedemy of Science USA, vol. 98, pp. 5631-5636.
Cite This Article
  • APA Style

    Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. (2017). Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomedical Statistics and Informatics, 2(4), 166-171. https://doi.org/10.11648/j.bsi.20170204.16

    Copy | Download

    ACS Style

    Harun or Rashid; Arefin Mowla; Siddikur Rahman; Siraj-Ud-Doulah; Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed. Stat. Inform. 2017, 2(4), 166-171. doi: 10.11648/j.bsi.20170204.16

    Copy | Download

    AMA Style

    Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed Stat Inform. 2017;2(4):166-171. doi: 10.11648/j.bsi.20170204.16

    Copy | Download

  • @article{10.11648/j.bsi.20170204.16,
      author = {Harun or Rashid and Arefin Mowla and Siddikur Rahman and Siraj-Ud-Doulah and Bipul Hossen},
      title = {Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data},
      journal = {Biomedical Statistics and Informatics},
      volume = {2},
      number = {4},
      pages = {166-171},
      doi = {10.11648/j.bsi.20170204.16},
      url = {https://doi.org/10.11648/j.bsi.20170204.16},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.bsi.20170204.16},
      abstract = {Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.},
     year = {2017}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data
    AU  - Harun or Rashid
    AU  - Arefin Mowla
    AU  - Siddikur Rahman
    AU  - Siraj-Ud-Doulah
    AU  - Bipul Hossen
    Y1  - 2017/10/20
    PY  - 2017
    N1  - https://doi.org/10.11648/j.bsi.20170204.16
    DO  - 10.11648/j.bsi.20170204.16
    T2  - Biomedical Statistics and Informatics
    JF  - Biomedical Statistics and Informatics
    JO  - Biomedical Statistics and Informatics
    SP  - 166
    EP  - 171
    PB  - Science Publishing Group
    SN  - 2578-8728
    UR  - https://doi.org/10.11648/j.bsi.20170204.16
    AB  - Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.
    VL  - 2
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh

  • Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh

  • Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh

  • Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh

  • Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh

  • Sections