Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data

Kipkorir Collins; Anthony Waititu; Anthony Wanjoya

doi:doi:10.11648/j.ijdsa.20200605.15

| Peer-Reviewed

Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data

Kipkorir Collins, Anthony Waititu, Anthony Wanjoya

Published in International Journal of Data Science and Analysis (Volume 6, Issue 5)

Received: 2 October 2020 Accepted: 20 October 2020 Published: 26 October 2020

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.

Published in	International Journal of Data Science and Analysis (Volume 6, Issue 5)
DOI	10.11648/j.ijdsa.20200605.15
Page(s)	153-162
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2020. Published by Science Publishing Group

Keywords

Over-dispersion, Count, Discrete Weibull, Artificial Neural Network

References

[1]	Karlaftis, M. G. and Tarko, A. P. (1998). Heterogeneity considerations in accident modeling. Accident Analysis & Prevention, 30 (4): 425–433.
[2]	Cameron, A. C. and Trivedi, P. K. (2013). Regression analysis of count data, volume 53. Cambridge university press.
[3]	Chin, H. C. and Quddus, M. A. (2003). Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections. Accident Analysis & Prevention, 35 (2): 253–259.
[4]	Lord, D. and Mannering, F. (2010). The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation research part A: policy and practice, 44 (5): 291–305.
[5]	Hauer, E. (1997). Observational before/after studies in road safety. Estimating the effect of highway and traffic engineering measures on road safety.
[6]	Kadane, J. B., Shmueli, G., Minka, T. P., Borle, S., Boatwright, P., et al. (2006). Conjugate analysis of the conway-maxwell-poisson distribution. Bayesian analysis, 1 (2): 363–374.
[7]	Consul, P. and Famoye, F. (1992). Generalized poisson regression model. Communications in Statistics-Theory and Methods, 21 (1): 89–109.
[8]	Sellers, K. F., Shmueli, G., et al. (2010). A flexible regression model for count data. The Annals of Applied Statistics, 4 (2): 943–961.
[9]	Smith, D. and Faddy, M. (2016). Mean and variance modeling of under-and overdispersed count data. Journal of Statistical Software, 69 (6): 1–23.
[10]	Sáez-Castillo, A. and Conde-Sánchez, A. (2013). A hyper-poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61: 148–157.
[11]	Chanialidis, C., Evers, L., Neocleous, T., and Nobile, A. (2018). Efficient bayesian inference for com-poisson regression models. Statistics and Computing, 28 (3): 595–608.
[12]	Klakattawi, H., Vinciotti, V., and Yu, K. (2018). A simple and adaptive dispersion regression model for count data. Entropy, 20 (2): 142.
[13]	Lee, A. H., Stevenson, M. R., Wang, K., and Yau, K. K. (2002). Modeling young driver motor vehicle crashes: data with extra zeros. Accident Analysis & Prevention, 34 (4): 515–521.
[14]	Berhanu, G. (2004). Models relating traffic safety with road environment and traffic flows on arterial roads in addis ababa. Accident Analysis & Prevention, 36 (5): 697–704.
[15]	Lord, D., Washington, S. P., and Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, 37 (1): 35–46.
[16]	Lord, D. (2006). Modeling motor vehicle crashes using poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accident Analysis & Prevention, 38 (4): 751–766.
[17]	Lord, D., Geedipally, S. R., and Guikema, S. D. (2010). Extension of the application of conway-maxwell-poisson models: Analyzing traffic crash data exhibiting underdispersion. Risk Analysis: An International Journal, 30 (8): 1268–1276.
[18]	Winkelmann, R. and Zimmermann, K. F. (1995). Recent developments in count data modelling: theory and application. Journal of economic surveys, 9 (1): 1–24.
[19]	Oh, J., Washington, S. P., and Nam, D. (2006). Accident prediction model for railway-highway interfaces. Accident Analysis & Prevention, 38 (2): 346–356.
[20]	Hilbe, J. M. (2011). Modeling count data. Springer.
[21]	Nakagawa, T. and Osaki, S. (1975). The discrete weibull distribution. IEEE Transactions on Reliability, 24 (5): 300–301.
[22]	Roy, D. (2004). Discrete rayleigh distribution. IEEE Transactions on Reliability, 53 (2): 255–260.
[23]	Sato, H., Ikota, M., Sugimoto, A., and Masuda, H. (1999). A new defect distribution metrology with a consistent discrete exponential formula and its applications. IEEE Transactions on Semiconductor Manufacturing, 12 (4): 409–418.
[24]	Barbiero, A. (2015). Discreteweibull: Discrete weibull distributions (type 1 and 3), r package version 1.1.
[25]	Da Silva, M. F., Ferrari, S. L. P., and Cribari-Neto, F. (2008). Improved likelihood inference for the shape parameter in weibull regression. Journal of Statistical Computation and Simulation, 78 (9): 789–811.
[26]	Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5 (3): 236–244.
[27]	Gichuhi, A. W. (2008). Nonparametric changepoint analysis for bernoulli random variables based on neural networks.
[28]	Yunos, Z. M., Ali, A., Shamsyuddin, S. M., Ismail, N., et al. (2016a). Predictive modelling for motor insurance claims using artificial neural networks. Int. J. Advance Soft Compu. Appl, 8 (3).
[29]	Haghani, S., Sedehi, M., and Kheiri, S. (2017). Artificial neural network to modeling zero- inflated count data: Application to predicting number of return to blood donation. Journal of research in health sciences, 17 (3): E1–4.
[30]	Ke, J. and Liu, X. (2008). Empirical analysis of optimal hidden neurons in neural network modeling for stock prediction. In 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, volume 2, pages 828–832. IEEE.
[31]	Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.

Cite This Article

Plain Text BibTeX RIS

APA Style

Kipkorir Collins, Anthony Waititu, Anthony Wanjoya. (2020). Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. International Journal of Data Science and Analysis, 6(5), 153-162. https://doi.org/10.11648/j.ijdsa.20200605.15

Copy | Download

ACS Style

Kipkorir Collins; Anthony Waititu; Anthony Wanjoya. Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. Int. J. Data Sci. Anal. 2020, 6(5), 153-162. doi: 10.11648/j.ijdsa.20200605.15

Copy | Download

AMA Style

Kipkorir Collins, Anthony Waititu, Anthony Wanjoya. Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. Int J Data Sci Anal. 2020;6(5):153-162. doi: 10.11648/j.ijdsa.20200605.15

Copy | Download

@article{10.11648/j.ijdsa.20200605.15,
  author = {Kipkorir Collins and Anthony Waititu and Anthony Wanjoya},
  title = {Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data},
  journal = {International Journal of Data Science and Analysis},
  volume = {6},
  number = {5},
  pages = {153-162},
  doi = {10.11648/j.ijdsa.20200605.15},
  url = {https://doi.org/10.11648/j.ijdsa.20200605.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20200605.15},
  abstract = {In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.},
 year = {2020}
}

Copy | Download

TY  - JOUR
T1  - Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data
AU  - Kipkorir Collins
AU  - Anthony Waititu
AU  - Anthony Wanjoya
Y1  - 2020/10/26
PY  - 2020
N1  - https://doi.org/10.11648/j.ijdsa.20200605.15
DO  - 10.11648/j.ijdsa.20200605.15
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 153
EP  - 162
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20200605.15
AB  - In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.
VL  - 6
IS  - 5
ER  -

Copy | Download

Author Information

Kipkorir Collins

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Anthony Waititu

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Anthony Wanjoya

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Kipkorir Collins, Anthony Waititu, Anthony Wanjoya. (2020). Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. International Journal of Data Science and Analysis, 6(5), 153-162. https://doi.org/10.11648/j.ijdsa.20200605.15

Copy | Download

ACS Style

Kipkorir Collins; Anthony Waititu; Anthony Wanjoya. Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. Int. J. Data Sci. Anal. 2020, 6(5), 153-162. doi: 10.11648/j.ijdsa.20200605.15

Copy | Download

AMA Style

Kipkorir Collins, Anthony Waititu, Anthony Wanjoya. Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. Int J Data Sci Anal. 2020;6(5):153-162. doi: 10.11648/j.ijdsa.20200605.15

Copy | Download

@article{10.11648/j.ijdsa.20200605.15,
  author = {Kipkorir Collins and Anthony Waititu and Anthony Wanjoya},
  title = {Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data},
  journal = {International Journal of Data Science and Analysis},
  volume = {6},
  number = {5},
  pages = {153-162},
  doi = {10.11648/j.ijdsa.20200605.15},
  url = {https://doi.org/10.11648/j.ijdsa.20200605.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20200605.15},
  abstract = {In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.},
 year = {2020}
}

Copy | Download

TY  - JOUR
T1  - Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data
AU  - Kipkorir Collins
AU  - Anthony Waititu
AU  - Anthony Wanjoya
Y1  - 2020/10/26
PY  - 2020
N1  - https://doi.org/10.11648/j.ijdsa.20200605.15
DO  - 10.11648/j.ijdsa.20200605.15
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 153
EP  - 162
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20200605.15
AB  - In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.
VL  - 6
IS  - 5
ER  -

Copy | Download