| Peer-Reviewed

Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data

Received: 8 October 2019    Accepted: 23 October 2019    Published: 30 October 2019
Views:       Downloads:
Abstract

To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.

Published in International Journal of Data Science and Analysis (Volume 5, Issue 5)
DOI 10.11648/j.ijdsa.20190505.15
Page(s) 104-110
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

BFMNB-3 Model, DMFT Index Data, BNB

References
[1] A. J. Dobson, “An introduction to generalized linear models.” Chapman & Hall/CRC, 2001.
[2] K. F. Sellers and G. Shmueli, “Data dispersion: now you see it… now you don’t,” Commun. Stat. Methods, vol. 42, no. 17, pp. 3134–3147, 2013.
[3] N. C. Pradhan and P. Leung, “A Poisson and negative binomial regression model of sea turtle interactions in Hawaii’s longline fishery,” Fish. Res., vol. 78, no. 2–3, pp. 309–322, 2006.
[4] R. Winkelmann, Econometric analysis of count data. Springer Science & Business Media, 2008.
[5] J. M. Hilbe, Modeling count data. Springer, 2011.
[6] E. S. Park and D. Lord, “Multivariate Poisson-lognormal models for jointly modeling crash frequency by severity,” Transp. Res. Rec., vol. 2019, no. 1, pp. 1–6, 2007.
[7] E. Hauer, Observational before/after studies in road safety. Estimating the effect of highway and traffic engineering measures on road safety. 1997.
[8] J. B. Kadane, G. Shmueli, T. P. Minka, S. Borle, P. Boatwright, and others, “Conjugate analysis of the Conway-Maxwell-Poisson distribution,” Bayesian Anal., vol. 1, no. 2, pp. 363–374, 2006.
[9] G. Shmueli, T. P. Minka, J. B. Kadane, S. Borle, and P. Boatwright, “A useful distribution for fitting discrete data: revival of the Conway--Maxwell--Poisson distribution,” J. R. Stat. Soc. Ser. C (Applied Stat., vol. 54, no. 1, pp. 127–142, 2005.
[10] H. Madsen and P. Thyregod, Introduction to general and generalized linear models. CRC Press, 2010.
[11] Y. Lee, J. A. Nelder, and Y. Pawitan, Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall/CRC, 2018.
[12] D. Lord, S. D. Guikema, and S. R. Geedipally, “Application of the Conway--Maxwell--Poisson generalized linear model for analyzing motor vehicle crashes,” Accid. Anal. Prev., vol. 40, no. 3, pp. 1123–1134, 2008.
[13] K. F. Sellers, S. Borle, and G. Shmueli, “The COM-Poisson model for count data: a survey of methods and applications,” Appl. Stoch. Model. Bus. Ind., vol. 28, no. 2, pp. 104–116, 2012.
[14] S. D. Guikema and J. P. Coffelt, “Modeling count data in risk analysis and reliability engineering,” in Handbook of performability engineering, Springer, 2008, pp. 579–594.
[15] D. Spiegelhalter, A. Thomas, N. Best, and D. Lunn, “WinBUGS user manual.” version, 2003.
[16] Y. Zou, S. R. Geedipally, and D. Lord, “Evaluating the double Poisson generalized linear model,” Accid. Anal. Prev., vol. 59, pp. 497–505, 2013.
[17] Y. Zou, D. Lord, and S. R. Geedipally, “Over-and Under-Dispersed Crash Data: Comparing the Conway-Maxwell-Poisson and Double-Poisson Distributions,” Texas A & M University, 2012.
[18] J. K. Ghosh, M. Delampady, and T. Samanta, An introduction to Bayesian analysis: theory and methods. Springer Science & Business Media, 2007.
[19] A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53. Cambridge university press, 2013.
[20] M. Zhou and L. Carin, “Negative binomial process count and mixture modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 2, pp. 307–320, 2015.
[21] B.-J. Park, D. Lord, and J. D. Hart, “Bias properties of Bayesian statistics in finite mixture of negative binomial regression models in crash data analysis,” Accid. Anal. Prev., vol. 42, no. 2, pp. 741–749, 2010.
[22] S. Zamani Dadaneh, M. Zhou, and X. Qian, “Bayesian negative binomial regression for differential expression with confounding factors,” Bioinformatics, 2018.
Cite This Article
  • APA Style

    Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili. (2019). Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. International Journal of Data Science and Analysis, 5(5), 104-110. https://doi.org/10.11648/j.ijdsa.20190505.15

    Copy | Download

    ACS Style

    Kipngetich Gideon; Anthony Wanjoya; Samuel Mwalili. Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. Int. J. Data Sci. Anal. 2019, 5(5), 104-110. doi: 10.11648/j.ijdsa.20190505.15

    Copy | Download

    AMA Style

    Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili. Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. Int J Data Sci Anal. 2019;5(5):104-110. doi: 10.11648/j.ijdsa.20190505.15

    Copy | Download

  • @article{10.11648/j.ijdsa.20190505.15,
      author = {Kipngetich Gideon and Anthony Wanjoya and Samuel Mwalili},
      title = {Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data},
      journal = {International Journal of Data Science and Analysis},
      volume = {5},
      number = {5},
      pages = {104-110},
      doi = {10.11648/j.ijdsa.20190505.15},
      url = {https://doi.org/10.11648/j.ijdsa.20190505.15},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20190505.15},
      abstract = {To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.},
     year = {2019}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data
    AU  - Kipngetich Gideon
    AU  - Anthony Wanjoya
    AU  - Samuel Mwalili
    Y1  - 2019/10/30
    PY  - 2019
    N1  - https://doi.org/10.11648/j.ijdsa.20190505.15
    DO  - 10.11648/j.ijdsa.20190505.15
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 104
    EP  - 110
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20190505.15
    AB  - To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.
    VL  - 5
    IS  - 5
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Sections