Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables

Egbo Ikechukwu

doi:doi:10.11648/j.ajtas.20160504.12

| Peer-Reviewed |

Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables

Egbo Ikechukwu

Received: 1 April 2016 Accepted: 19 April 2016 Published: 4 June 2016

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.

DOI	10.11648/j.ajtas.20160504.12
Published in	American Journal of Theoretical and Applied Statistics (Volume 5, Issue 4, July 2016)
Page(s)	173-179
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Discriminant Analysis, Error Rate, Monte Carlo Simulation, Error Rate Estimators

References

[1]	Anderson, T. W. (1951), Classification by Multivariate analysis, Psychometric, 16, 631-650.
[2]	Efron, B. (1983), Estimating the error rate of a prediction rule: improvement on cross validation. Journal of the American Statistical Association, 78, 316-331.
[3]	Fisher, R. A. (1936). The use of multiple measurements in taxanomic problem. Annals of Eugenics, 7, 179-188.
[4]	Glick, N. (1978), Additive estimators for probabilities of correct classification. Pattern Recognition, 10, 211-222.
[5]	John, N. (1961) “Errors in discrimination” Annals of Mathematical Statistics, 32, 1125-1144
[6]	Lachenbruch, P. A. (1967), an almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. Biometrics, 23, 639-645.
[7]	Lachenbruch, P. A. & Michey, M. R. (1968), Estiamtion of error rates in discriminant analysis, Technometrics, 10, 1-11.
[8]	McLachlan, G. J. (1972), An Asymptotic Unbiased Techniques.
[9]	McLachlan, G. J. (1974),” The Asymptotic Unbiased distribution of the conditional error rate and risk in Discriminant Analysis”, Biometrics 61, 239-249.
[10]	Moore, D. H. (1973) “Evaluation of five Discriminant procedures for binary variables’ Journal of the American Statistical Association, 68, 399-404.
[11]	Okamoto, M. (1963), An Asymptotic Expansion for distribution of linear Discriminant function, Ann Math Stat, 34, 1286-1301.
[12]	Okamoto, M. (1971) “Correction to the Asymptotic expansion for distribution of the linear Discriminant function” Annals of Mathematical Statistics 39, 1358-1359.
[13]	Quenouille, M. (1949), Approximate tests of correlation in time series. Journal of the Royal Statistical Society Series B, 11, pp 18-84.
[14]	Sayre, J. W. (1980) “The distributions of the actual error rates in linear Discriminant Analysis”. Journal of American Statistical Association, 75, 201-205.
[15]	Sedranski, N. &Okamoto, M. (1971) “Estimation of the probabilities of misclassification for a linear Discriminant function in the Univariate normal case. Annals of the Institute of Statistical Mathematics, 23, 419-435.
[16]	Lachenbruch, P. & Mickey, M. (1968) “Estimation of error rates in discriminant analysis”. Technometrics, vol 10, pp 167-178.
[17]	Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical approach, Englenood cliffs, NJ: Prentice-Hall international.
[18]	Efron, B. & Gong, G. (1983). Estimating the error rate of prediction rule, Improvement on Cross validation. Journal of American Statistical Association, vol 78, pp 316-331.
[19]	Dongherty, E. R. & Braga-Neto, U. M. (2006). Epistemology of computational Biology: Mathematical models and Experimental prediction as the Basis of their validity. Biological Systems, vol 14 no. 1, pp 65-90.
[20]	Vishwa Nath Maurya; Madaki, U. Y.; Vijay, V. S. 7 Babagana, M. (2015). Application of Discriminant Analysis onb Broncho-pulmonary Dysplasia among infants: A case study of UMTH and UDUS Hospitals in Maiduguri, Nigeria. American Journal of Theoretical and Applied Statistics, 4 (2-1): 44-51.
[21]	Vishwa N. M.; Ram, B. M.; Chandra, K. J. & Avadhesh, K. M. (2015). Performance analysis of powers osskewness and kurtosis based multivariate normality tests and use of estended Monte Carlo Simulation for proposed novelty algorithm. American Journal of Theoretical and Applied Statistics, 4 (2-1): 11-18.
[22]	Egbo, I.; Onyeagu, S. I.; Ekezie, D. D. & Uzoma, P. O. (2014). A comparison of the optimal classification Rule and maximum likelihood Rule for Binary Variables. Journal of Mathematics Research, vol 6 No. 4.
[23]	Egbo, I.; Onyeagu, S. I. & Ekezie, D. D. (2014). A comparison of multinomial classification Rules for Binary variables. International Journal of Maths. Sci. & Eng. Appls., vol 8 No V.
[24]	Egbo, I.; Egbo, M. &Onyeagu, S. I. (2015). Performance of Robust linear classifier with multivariate Binary variables. Journal of Mathematics Research, vol 7 No 4.
[25]	Egbo, I. (2015). Discriminant analysis procedures under non-optimal conditions for Binary variables. American Journal of Theoretical and Applied Statistics, 4 (6): 602-609.

Cite This Article

Plain Text BibTeX RIS

APA Style

Egbo Ikechukwu. (2016). Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. American Journal of Theoretical and Applied Statistics, 5(4), 173-179. https://doi.org/10.11648/j.ajtas.20160504.12

Copy | Download

ACS Style

Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am. J. Theor. Appl. Stat. 2016, 5(4), 173-179. doi: 10.11648/j.ajtas.20160504.12

Copy | Download

AMA Style

Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am J Theor Appl Stat. 2016;5(4):173-179. doi: 10.11648/j.ajtas.20160504.12

Copy | Download

@article{10.11648/j.ajtas.20160504.12,
  author = {Egbo Ikechukwu},
  title = {Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {5},
  number = {4},
  pages = {173-179},
  doi = {10.11648/j.ajtas.20160504.12},
  url = {https://doi.org/10.11648/j.ajtas.20160504.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.12},
  abstract = {Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.},
 year = {2016}
}

Copy | Download

TY  - JOUR
T1  - Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables
AU  - Egbo Ikechukwu
Y1  - 2016/06/04
PY  - 2016
N1  - https://doi.org/10.11648/j.ajtas.20160504.12
DO  - 10.11648/j.ajtas.20160504.12
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 173
EP  - 179
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20160504.12
AB  - Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.
VL  - 5
IS  - 4
ER  -

Copy | Download

Author Information

Egbo Ikechukwu

Department of Mathematics, Alvan Ikoku Federal College of Education, Owerri, Nigeria

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Egbo Ikechukwu. (2016). Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. American Journal of Theoretical and Applied Statistics, 5(4), 173-179. https://doi.org/10.11648/j.ajtas.20160504.12

Copy | Download

ACS Style

Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am. J. Theor. Appl. Stat. 2016, 5(4), 173-179. doi: 10.11648/j.ajtas.20160504.12

Copy | Download

AMA Style

Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am J Theor Appl Stat. 2016;5(4):173-179. doi: 10.11648/j.ajtas.20160504.12

Copy | Download

@article{10.11648/j.ajtas.20160504.12,
  author = {Egbo Ikechukwu},
  title = {Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {5},
  number = {4},
  pages = {173-179},
  doi = {10.11648/j.ajtas.20160504.12},
  url = {https://doi.org/10.11648/j.ajtas.20160504.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.12},
  abstract = {Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.},
 year = {2016}
}

Copy | Download

TY  - JOUR
T1  - Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables
AU  - Egbo Ikechukwu
Y1  - 2016/06/04
PY  - 2016
N1  - https://doi.org/10.11648/j.ajtas.20160504.12
DO  - 10.11648/j.ajtas.20160504.12
T2  - American Journal of Theoretical and Applied Statistics
JF  - American Journal of Theoretical and Applied Statistics
JO  - American Journal of Theoretical and Applied Statistics
SP  - 173
EP  - 179
PB  - Science Publishing Group
SN  - 2326-9006
UR  - https://doi.org/10.11648/j.ajtas.20160504.12
AB  - Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.
VL  - 5
IS  - 4
ER  -

Copy | Download