Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data

Kipngetich Gideon; Anthony Wanjoya; Samuel Mwalili

doi:doi:10.11648/j.ijdsa.20190505.15

| Peer-Reviewed

Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data

Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili

Published in International Journal of Data Science and Analysis (Volume 5, Issue 5)

Received: 8 October 2019 Accepted: 23 October 2019 Published: 30 October 2019

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.

Published in	International Journal of Data Science and Analysis (Volume 5, Issue 5)
DOI	10.11648/j.ijdsa.20190505.15
Page(s)	104-110
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

BFMNB-3 Model, DMFT Index Data, BNB

References

[1]	A. J. Dobson, “An introduction to generalized linear models.” Chapman & Hall/CRC, 2001.
[2]	K. F. Sellers and G. Shmueli, “Data dispersion: now you see it… now you don’t,” Commun. Stat. Methods, vol. 42, no. 17, pp. 3134–3147, 2013.
[3]	N. C. Pradhan and P. Leung, “A Poisson and negative binomial regression model of sea turtle interactions in Hawaii’s longline fishery,” Fish. Res., vol. 78, no. 2–3, pp. 309–322, 2006.
[4]	R. Winkelmann, Econometric analysis of count data. Springer Science & Business Media, 2008.
[5]	J. M. Hilbe, Modeling count data. Springer, 2011.
[6]	E. S. Park and D. Lord, “Multivariate Poisson-lognormal models for jointly modeling crash frequency by severity,” Transp. Res. Rec., vol. 2019, no. 1, pp. 1–6, 2007.
[7]	E. Hauer, Observational before/after studies in road safety. Estimating the effect of highway and traffic engineering measures on road safety. 1997.
[8]	J. B. Kadane, G. Shmueli, T. P. Minka, S. Borle, P. Boatwright, and others, “Conjugate analysis of the Conway-Maxwell-Poisson distribution,” Bayesian Anal., vol. 1, no. 2, pp. 363–374, 2006.
[9]	G. Shmueli, T. P. Minka, J. B. Kadane, S. Borle, and P. Boatwright, “A useful distribution for fitting discrete data: revival of the Conway--Maxwell--Poisson distribution,” J. R. Stat. Soc. Ser. C (Applied Stat., vol. 54, no. 1, pp. 127–142, 2005.
[10]	H. Madsen and P. Thyregod, Introduction to general and generalized linear models. CRC Press, 2010.
[11]	Y. Lee, J. A. Nelder, and Y. Pawitan, Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall/CRC, 2018.
[12]	D. Lord, S. D. Guikema, and S. R. Geedipally, “Application of the Conway--Maxwell--Poisson generalized linear model for analyzing motor vehicle crashes,” Accid. Anal. Prev., vol. 40, no. 3, pp. 1123–1134, 2008.
[13]	K. F. Sellers, S. Borle, and G. Shmueli, “The COM-Poisson model for count data: a survey of methods and applications,” Appl. Stoch. Model. Bus. Ind., vol. 28, no. 2, pp. 104–116, 2012.
[14]	S. D. Guikema and J. P. Coffelt, “Modeling count data in risk analysis and reliability engineering,” in Handbook of performability engineering, Springer, 2008, pp. 579–594.
[15]	D. Spiegelhalter, A. Thomas, N. Best, and D. Lunn, “WinBUGS user manual.” version, 2003.
[16]	Y. Zou, S. R. Geedipally, and D. Lord, “Evaluating the double Poisson generalized linear model,” Accid. Anal. Prev., vol. 59, pp. 497–505, 2013.
[17]	Y. Zou, D. Lord, and S. R. Geedipally, “Over-and Under-Dispersed Crash Data: Comparing the Conway-Maxwell-Poisson and Double-Poisson Distributions,” Texas A & M University, 2012.
[18]	J. K. Ghosh, M. Delampady, and T. Samanta, An introduction to Bayesian analysis: theory and methods. Springer Science & Business Media, 2007.
[19]	A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53. Cambridge university press, 2013.
[20]	M. Zhou and L. Carin, “Negative binomial process count and mixture modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 2, pp. 307–320, 2015.
[21]	B.-J. Park, D. Lord, and J. D. Hart, “Bias properties of Bayesian statistics in finite mixture of negative binomial regression models in crash data analysis,” Accid. Anal. Prev., vol. 42, no. 2, pp. 741–749, 2010.
[22]	S. Zamani Dadaneh, M. Zhou, and X. Qian, “Bayesian negative binomial regression for differential expression with confounding factors,” Bioinformatics, 2018.

Cite This Article

Plain Text BibTeX RIS

APA Style

Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili. (2019). Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. International Journal of Data Science and Analysis, 5(5), 104-110. https://doi.org/10.11648/j.ijdsa.20190505.15

Copy | Download

ACS Style

Kipngetich Gideon; Anthony Wanjoya; Samuel Mwalili. Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. Int. J. Data Sci. Anal. 2019, 5(5), 104-110. doi: 10.11648/j.ijdsa.20190505.15

Copy | Download

AMA Style

Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili. Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. Int J Data Sci Anal. 2019;5(5):104-110. doi: 10.11648/j.ijdsa.20190505.15

Copy | Download

@article{10.11648/j.ijdsa.20190505.15,
  author = {Kipngetich Gideon and Anthony Wanjoya and Samuel Mwalili},
  title = {Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data},
  journal = {International Journal of Data Science and Analysis},
  volume = {5},
  number = {5},
  pages = {104-110},
  doi = {10.11648/j.ijdsa.20190505.15},
  url = {https://doi.org/10.11648/j.ijdsa.20190505.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20190505.15},
  abstract = {To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.},
 year = {2019}
}

Copy | Download

TY  - JOUR
T1  - Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data
AU  - Kipngetich Gideon
AU  - Anthony Wanjoya
AU  - Samuel Mwalili
Y1  - 2019/10/30
PY  - 2019
N1  - https://doi.org/10.11648/j.ijdsa.20190505.15
DO  - 10.11648/j.ijdsa.20190505.15
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 104
EP  - 110
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20190505.15
AB  - To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.
VL  - 5
IS  - 5
ER  -

Copy | Download

Author Information

Kipngetich Gideon

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Anthony Wanjoya

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Samuel Mwalili

Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili. (2019). Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. International Journal of Data Science and Analysis, 5(5), 104-110. https://doi.org/10.11648/j.ijdsa.20190505.15

Copy | Download

ACS Style

Kipngetich Gideon; Anthony Wanjoya; Samuel Mwalili. Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. Int. J. Data Sci. Anal. 2019, 5(5), 104-110. doi: 10.11648/j.ijdsa.20190505.15

Copy | Download

AMA Style

Kipngetich Gideon, Anthony Wanjoya, Samuel Mwalili. Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data. Int J Data Sci Anal. 2019;5(5):104-110. doi: 10.11648/j.ijdsa.20190505.15

Copy | Download

@article{10.11648/j.ijdsa.20190505.15,
  author = {Kipngetich Gideon and Anthony Wanjoya and Samuel Mwalili},
  title = {Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data},
  journal = {International Journal of Data Science and Analysis},
  volume = {5},
  number = {5},
  pages = {104-110},
  doi = {10.11648/j.ijdsa.20190505.15},
  url = {https://doi.org/10.11648/j.ijdsa.20190505.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20190505.15},
  abstract = {To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.},
 year = {2019}
}

Copy | Download

TY  - JOUR
T1  - Bayesian Finite Mixture Negative Binomial Model for Over-dispersed Count Data with Application to DMFT Index Data
AU  - Kipngetich Gideon
AU  - Anthony Wanjoya
AU  - Samuel Mwalili
Y1  - 2019/10/30
PY  - 2019
N1  - https://doi.org/10.11648/j.ijdsa.20190505.15
DO  - 10.11648/j.ijdsa.20190505.15
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 104
EP  - 110
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20190505.15
AB  - To establish viable statistical model for modelling and analyzing DMFT index data which is important in oral health studies, difficulty arise when DMFT index data is characterized by over-dispersion. Over-dispersion caused by unobserved heterogeneity in the data pose a problem in fitting more common models to this data. and failure to account on such heterogeneity in the model can undermine the validity of the empirical results. The limitations of other count data models to account for overdispersion in DMFT index data due to existence of heterogeneity in the data, this paper formulated alternative model that captures heterogeneity in the data, that is Bayesian Finite mixture negative binomial regression model and the model applied to simulated overdispersed count data to determine the exact number of negative binomial components to be mixed and finally apply the model to DMFT index data. Bayesian finite mixture Negative Binomial (BFMNB-3) regression model is useful since the data were collected from heterogenous population. simulation results shows that 3-component Bayesian finite mixture of NB regression model converges and was quite enough to model the overdispersed simulated count data, applying BFMNB-3 model to DMFT index data, the model capability to capture heterogeneity in the data identifies that the methods; all the treatment (all methods together), mouth wash with 0.2% sodium fluoride and Oral hygiene were the best methods in preventing tooth decay in children in Belo Horizonte (Brazil) aged seven years this shows that BFMNB-3 performs better than BNB model were due to heterogeneity present in methods it only identifies methods; all the treatment (all methods together) and mouth wash with 0.2% sodium fluoride to be the best methods for preventing tooth decay for children in Belo Horizonte (Brazil) aged seven while this two methods were not the only significant methods, therefore from results there is complete superiority of BFMNB-3 over BNB model. R statistical software was used to accomplish the objectives of this paper.
VL  - 5
IS  - 5
ER  -

Copy | Download