American Journal of Theoretical and Applied Statistics

| Peer-Reviewed |

Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables

Received: 1 April 2016    Accepted: 19 April 2016    Published: 4 June 2016
Views:       Downloads:

Share This Article

Abstract

Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.

DOI 10.11648/j.ajtas.20160504.12
Published in American Journal of Theoretical and Applied Statistics (Volume 5, Issue 4, July 2016)
Page(s) 173-179
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Discriminant Analysis, Error Rate, Monte Carlo Simulation, Error Rate Estimators

References
[1] Anderson, T. W. (1951), Classification by Multivariate analysis, Psychometric, 16, 631-650.
[2] Efron, B. (1983), Estimating the error rate of a prediction rule: improvement on cross validation. Journal of the American Statistical Association, 78, 316-331.
[3] Fisher, R. A. (1936). The use of multiple measurements in taxanomic problem. Annals of Eugenics, 7, 179-188.
[4] Glick, N. (1978), Additive estimators for probabilities of correct classification. Pattern Recognition, 10, 211-222.
[5] John, N. (1961) “Errors in discrimination” Annals of Mathematical Statistics, 32, 1125-1144
[6] Lachenbruch, P. A. (1967), an almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. Biometrics, 23, 639-645.
[7] Lachenbruch, P. A. & Michey, M. R. (1968), Estiamtion of error rates in discriminant analysis, Technometrics, 10, 1-11.
[8] McLachlan, G. J. (1972), An Asymptotic Unbiased Techniques.
[9] McLachlan, G. J. (1974),” The Asymptotic Unbiased distribution of the conditional error rate and risk in Discriminant Analysis”, Biometrics 61, 239-249.
[10] Moore, D. H. (1973) “Evaluation of five Discriminant procedures for binary variables’ Journal of the American Statistical Association, 68, 399-404.
[11] Okamoto, M. (1963), An Asymptotic Expansion for distribution of linear Discriminant function, Ann Math Stat, 34, 1286-1301.
[12] Okamoto, M. (1971) “Correction to the Asymptotic expansion for distribution of the linear Discriminant function” Annals of Mathematical Statistics 39, 1358-1359.
[13] Quenouille, M. (1949), Approximate tests of correlation in time series. Journal of the Royal Statistical Society Series B, 11, pp 18-84.
[14] Sayre, J. W. (1980) “The distributions of the actual error rates in linear Discriminant Analysis”. Journal of American Statistical Association, 75, 201-205.
[15] Sedranski, N. &Okamoto, M. (1971) “Estimation of the probabilities of misclassification for a linear Discriminant function in the Univariate normal case. Annals of the Institute of Statistical Mathematics, 23, 419-435.
[16] Lachenbruch, P. & Mickey, M. (1968) “Estimation of error rates in discriminant analysis”. Technometrics, vol 10, pp 167-178.
[17] Devijver, P. A. & Kittler, J. (1982). Pattern Recognition: A Statistical approach, Englenood cliffs, NJ: Prentice-Hall international.
[18] Efron, B. & Gong, G. (1983). Estimating the error rate of prediction rule, Improvement on Cross validation. Journal of American Statistical Association, vol 78, pp 316-331.
[19] Dongherty, E. R. & Braga-Neto, U. M. (2006). Epistemology of computational Biology: Mathematical models and Experimental prediction as the Basis of their validity. Biological Systems, vol 14 no. 1, pp 65-90.
[20] Vishwa Nath Maurya; Madaki, U. Y.; Vijay, V. S. 7 Babagana, M. (2015). Application of Discriminant Analysis onb Broncho-pulmonary Dysplasia among infants: A case study of UMTH and UDUS Hospitals in Maiduguri, Nigeria. American Journal of Theoretical and Applied Statistics, 4 (2-1): 44-51.
[21] Vishwa N. M.; Ram, B. M.; Chandra, K. J. & Avadhesh, K. M. (2015). Performance analysis of powers osskewness and kurtosis based multivariate normality tests and use of estended Monte Carlo Simulation for proposed novelty algorithm. American Journal of Theoretical and Applied Statistics, 4 (2-1): 11-18.
[22] Egbo, I.; Onyeagu, S. I.; Ekezie, D. D. & Uzoma, P. O. (2014). A comparison of the optimal classification Rule and maximum likelihood Rule for Binary Variables. Journal of Mathematics Research, vol 6 No. 4.
[23] Egbo, I.; Onyeagu, S. I. & Ekezie, D. D. (2014). A comparison of multinomial classification Rules for Binary variables. International Journal of Maths. Sci. & Eng. Appls., vol 8 No V.
[24] Egbo, I.; Egbo, M. &Onyeagu, S. I. (2015). Performance of Robust linear classifier with multivariate Binary variables. Journal of Mathematics Research, vol 7 No 4.
[25] Egbo, I. (2015). Discriminant analysis procedures under non-optimal conditions for Binary variables. American Journal of Theoretical and Applied Statistics, 4 (6): 602-609.
Cite This Article
  • APA Style

    Egbo Ikechukwu. (2016). Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. American Journal of Theoretical and Applied Statistics, 5(4), 173-179. https://doi.org/10.11648/j.ajtas.20160504.12

    Copy | Download

    ACS Style

    Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am. J. Theor. Appl. Stat. 2016, 5(4), 173-179. doi: 10.11648/j.ajtas.20160504.12

    Copy | Download

    AMA Style

    Egbo Ikechukwu. Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables. Am J Theor Appl Stat. 2016;5(4):173-179. doi: 10.11648/j.ajtas.20160504.12

    Copy | Download

  • @article{10.11648/j.ajtas.20160504.12,
      author = {Egbo Ikechukwu},
      title = {Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {5},
      number = {4},
      pages = {173-179},
      doi = {10.11648/j.ajtas.20160504.12},
      url = {https://doi.org/10.11648/j.ajtas.20160504.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.12},
      abstract = {Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables
    AU  - Egbo Ikechukwu
    Y1  - 2016/06/04
    PY  - 2016
    N1  - https://doi.org/10.11648/j.ajtas.20160504.12
    DO  - 10.11648/j.ajtas.20160504.12
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 173
    EP  - 179
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20160504.12
    AB  - Classification problems often suffers from small samples in conjunction with large number of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied and the sample size. This paper is concerned with evaluation of error rate estimators in two group discriminant analysis with multivariate binary variables. Behaviour of eight most commonly used estimators are compared and contrasted by mean of Monte Carlo Simulation. The criterion used for comparing those error rate estimators is sum squared error rate (SSE). Four experimental factors are considered for the simulation namely: the number of variables, the sample size relative to number of variables, the prior probability and the correlation between the variables in the populations. From the analysis carried out the estimators can be ranked as follows: DS, O, OS, U, R, JK, P and D.
    VL  - 5
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Department of Mathematics, Alvan Ikoku Federal College of Education, Owerri, Nigeria

  • Sections