Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data

Harun or Rashid; Arefin Mowla; Siddikur Rahman; Siraj-Ud-Doulah; Bipul Hossen

doi:doi:10.11648/j.bsi.20170204.16

| Peer-Reviewed |

Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data

Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen

Received: 29 July 2017 Accepted: 30 August 2017 Published: 20 October 2017

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.

DOI	10.11648/j.bsi.20170204.16
Published in	Biomedical Statistics and Informatics (Volume 2, Issue 4, December 2017)
Page(s)	166-171
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Microarray Gene Expression Data, T-Test, Renewed Approach, Wilcoxon Signed Rank Test, Differentially Expressed Genes, Outlier

References

[1]	Nguyen TV, Andresen BS, Corydon TJ, Ghisla S, Abd-El Razik N, Mohsen AW, Cederbaum SD, Roe DS, Roe CR, Lench NJ, Vockley J (2002); Identification of isobutyryl-CoA dehydrogenase and its deficiency in humans. Mol Genet Metab, vol. 77, pp. 68-79.
[2]	Chu G, Narasimhan B, Tibshirani R, Tusher V (2002); "SAM "Significance Analysis of Microarrays" Users Guide and technical document."
[3]	Monti S, Tamayo P, Mesirov J, Golub T. (2003); Consensus clustering: a re-sampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn, vol. 52, pp. 91-118.
[4]	Devore J. And Peck R (1997); “Satistics: The exploration and analysis of data”, 3rd edition, Duxury Press, Pacific Grove, CA.
[5]	Thomas JG, Olson JM, Tapscott SJ, Zhao (2001); An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, vol.11, No. 7, pp. 1227-1236.
[6]	Pan W (2001); A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics, vol. 18, pp. 546-554.
[7]	Efron B, Tibshirani R, Gross V, Tusher VG (2001); Empirical Bayes analysis of a microarray experiment. Journal of American Statistic Association, vol. 96, pp. 1151-1160.
[8]	Tusher VG, Tibshirani R, and Chu G (2001); “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98, pp. 5116-5121.
[9]	Jung K., Quast K., Gannoun A. and Urfer W. (2006); A renewed approach to the nonparametric analysis of replicated microarray experiments. Biometrical Journal, vol. 48, pp. 245-254.
[10]	Quackenbush J (2001); Computational analysis of cDNA microarray data. Nature Reviews, vol. 6, No. 2, pp. 418-428.
[11]	Chun-Ming Jiang, Xiao-Hua Wang, Jin Shu, Wei-Xia Yang, Ping Fu, Li-Li Zhuang, Guo-Ping Zhou (2015); Analysis of differentially expressed genes based on microarray data of glioma. Int J Clin Exp Med, vol. 8, pp. 17321–17332.
[12]	Jennifer SM, Ariana KL, Charles JR, Qing-Xiang AS (2015); Differentially Expressed Genes and Signature Pathways of Human Prostate Cancer. PLoS One, vol. 10, No. 12, e0145322. https://doi.org/10.1371/journal.pone.0145322.
[13]	Hossen Md. B. and Siraj-Ud-Doulah (2016); Identification of Robust Clustering Methods in Gene Expression Data Analysis. Current Bioinformatics, vol. 11, No. 3, pp. 01-05.
[14]	Best DJ and Rayner JC (1987); Multiple Comparisons, Selection and Applications in Biometry. Vol. 30, pp. 719-724.
[15]	Dudoit S, Shaffer CBJ (2003); Multiple hypothesis testing in microarray experiments. Statistical Science. vol. 18, No. 1, pp. 71–103.
[16]	Alka B, Monir HS, Hassan AK (2015); Incremental principal component analysis based outlier detection methods for spatiotemporal data streams. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. II-4/W2, pp. 67-71.
[17]	Jolliffe (2001); Principal Component Analysis, 2nd edition, Springer Series in Statistics.
[18]	Snedecor, G. W., Cochran, W. G. (1980). Statistical Methods (seventh edition). Iowa State University, Press, Ames, Iowa.
[19]	Corder, G. W., Foreman, D. I. (2009). Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach Wiley, ISBN 978-0-470-45461-9.
[20]	Meiller A, Alvarez S, Drané P, Lallemand C, Blanchard B, et al. (2007); p53-dependent stimulation of redox-related genes in the lymphoid organs of gamma-irradiated mice: identification of haeme-oxygenase 1 as a direct p53 target gene. Nucleic Acids Res, vol. 20, pp. 6924–6934.
[21]	Zhao LP, Prentice R and Breeden L (2001); Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc National Acedemy of Science USA, vol. 98, pp. 5631-5636.

Cite This Article

Plain Text BibTeX RIS

APA Style

Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. (2017). Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomedical Statistics and Informatics, 2(4), 166-171. https://doi.org/10.11648/j.bsi.20170204.16

Copy | Download

ACS Style

Harun or Rashid; Arefin Mowla; Siddikur Rahman; Siraj-Ud-Doulah; Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed. Stat. Inform. 2017, 2(4), 166-171. doi: 10.11648/j.bsi.20170204.16

Copy | Download

AMA Style

Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed Stat Inform. 2017;2(4):166-171. doi: 10.11648/j.bsi.20170204.16

Copy | Download

@article{10.11648/j.bsi.20170204.16,
  author = {Harun or Rashid and Arefin Mowla and Siddikur Rahman and Siraj-Ud-Doulah and Bipul Hossen},
  title = {Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data},
  journal = {Biomedical Statistics and Informatics},
  volume = {2},
  number = {4},
  pages = {166-171},
  doi = {10.11648/j.bsi.20170204.16},
  url = {https://doi.org/10.11648/j.bsi.20170204.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.bsi.20170204.16},
  abstract = {Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.},
 year = {2017}
}

Copy | Download

TY  - JOUR
T1  - Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data
AU  - Harun or Rashid
AU  - Arefin Mowla
AU  - Siddikur Rahman
AU  - Siraj-Ud-Doulah
AU  - Bipul Hossen
Y1  - 2017/10/20
PY  - 2017
N1  - https://doi.org/10.11648/j.bsi.20170204.16
DO  - 10.11648/j.bsi.20170204.16
T2  - Biomedical Statistics and Informatics
JF  - Biomedical Statistics and Informatics
JO  - Biomedical Statistics and Informatics
SP  - 166
EP  - 171
PB  - Science Publishing Group
SN  - 2578-8728
UR  - https://doi.org/10.11648/j.bsi.20170204.16
AB  - Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.
VL  - 2
IS  - 4
ER  -

Copy | Download

Author Information

Harun or Rashid

Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Arefin Mowla

Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Siddikur Rahman

Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Siraj-Ud-Doulah

Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh
Bipul Hossen

Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur, Bangladesh

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. (2017). Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomedical Statistics and Informatics, 2(4), 166-171. https://doi.org/10.11648/j.bsi.20170204.16

Copy | Download

ACS Style

Harun or Rashid; Arefin Mowla; Siddikur Rahman; Siraj-Ud-Doulah; Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed. Stat. Inform. 2017, 2(4), 166-171. doi: 10.11648/j.bsi.20170204.16

Copy | Download

AMA Style

Harun or Rashid, Arefin Mowla, Siddikur Rahman, Siraj-Ud-Doulah, Bipul Hossen. Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data. Biomed Stat Inform. 2017;2(4):166-171. doi: 10.11648/j.bsi.20170204.16

Copy | Download

@article{10.11648/j.bsi.20170204.16,
  author = {Harun or Rashid and Arefin Mowla and Siddikur Rahman and Siraj-Ud-Doulah and Bipul Hossen},
  title = {Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data},
  journal = {Biomedical Statistics and Informatics},
  volume = {2},
  number = {4},
  pages = {166-171},
  doi = {10.11648/j.bsi.20170204.16},
  url = {https://doi.org/10.11648/j.bsi.20170204.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.bsi.20170204.16},
  abstract = {Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.},
 year = {2017}
}

Copy | Download

TY  - JOUR
T1  - Statistical Tests for Identification of Differentially Expressed Genes in Microarray Data
AU  - Harun or Rashid
AU  - Arefin Mowla
AU  - Siddikur Rahman
AU  - Siraj-Ud-Doulah
AU  - Bipul Hossen
Y1  - 2017/10/20
PY  - 2017
N1  - https://doi.org/10.11648/j.bsi.20170204.16
DO  - 10.11648/j.bsi.20170204.16
T2  - Biomedical Statistics and Informatics
JF  - Biomedical Statistics and Informatics
JO  - Biomedical Statistics and Informatics
SP  - 166
EP  - 171
PB  - Science Publishing Group
SN  - 2578-8728
UR  - https://doi.org/10.11648/j.bsi.20170204.16
AB  - Gene expression assay provide a fast and organic way to identity disease markers relevant to clinical trial in modern age. In microarray experiments, differentially expressed genes, or discriminator genes, are the genes with considerably different expression patterns in two user-defined groups. Typically microarray data consists of huge amount of genes, and which genes are responsible or differentiable for a particular disease. Identification of differentially expressed genes across multiple conditions has become a vigorous statistical problem in analyzing large-scale microarray data. In this perspective, we considered a simulated data and real data sets (Head and Neck cancer). This paper uses some statistical methods: t-test, Wilcoxon signed-rank sum test and renewed approach to detect the differential expression of genes between conditions and finding the required number of differentially expressed genes. Additionally Principal Component Analysis (PCA) and largest difference from mean and data methods are used for visualizing outliers and finding numerical outliers respectively. If introducing some artificial outliers to simulated and real data sets and these outliers are not affected or not related to the differentially expressed genes. Results reveal that 25, 126 and 385 differentially expressed genes are identified by using t-test, Wilcoxon Rank sum test and Renewed Approach respectively. Among the three methods 23 common genes those are may be responsible for cancer disease. This paper shows that the two samples mean test (t-test) is perfectly used to identify the differentially expressed genes in microarray data.
VL  - 2
IS  - 4
ER  -

Copy | Download