Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data
International Journal of Data Science and Analysis
Volume 6, Issue 5, October 2020, Pages: 153-162
Received: Oct. 2, 2020;
Accepted: Oct. 20, 2020;
Published: Oct. 26, 2020
Views 164 Downloads 107
Kipkorir Collins, Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Anthony Waititu, Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Anthony Wanjoya, Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.
Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data, International Journal of Data Science and Analysis.
Vol. 6, No. 5,
2020, pp. 153-162.
Karlaftis, M. G. and Tarko, A. P. (1998). Heterogeneity considerations in accident modeling. Accident Analysis & Prevention, 30 (4): 425–433.
Cameron, A. C. and Trivedi, P. K. (2013). Regression analysis of count data, volume 53. Cambridge university press.
Chin, H. C. and Quddus, M. A. (2003). Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections. Accident Analysis & Prevention, 35 (2): 253–259.
Lord, D. and Mannering, F. (2010). The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation research part A: policy and practice, 44 (5): 291–305.
Hauer, E. (1997). Observational before/after studies in road safety. Estimating the effect of highway and traffic engineering measures on road safety.
Kadane, J. B., Shmueli, G., Minka, T. P., Borle, S., Boatwright, P., et al. (2006). Conjugate analysis of the conway-maxwell-poisson distribution. Bayesian analysis, 1 (2): 363–374.
Consul, P. and Famoye, F. (1992). Generalized poisson regression model. Communications in Statistics-Theory and Methods, 21 (1): 89–109.
Sellers, K. F., Shmueli, G., et al. (2010). A flexible regression model for count data. The Annals of Applied Statistics, 4 (2): 943–961.
Smith, D. and Faddy, M. (2016). Mean and variance modeling of under-and overdispersed count data. Journal of Statistical Software, 69 (6): 1–23.
Sáez-Castillo, A. and Conde-Sánchez, A. (2013). A hyper-poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61: 148–157.
Chanialidis, C., Evers, L., Neocleous, T., and Nobile, A. (2018). Efficient bayesian inference for com-poisson regression models. Statistics and Computing, 28 (3): 595–608.
Klakattawi, H., Vinciotti, V., and Yu, K. (2018). A simple and adaptive dispersion regression model for count data. Entropy, 20 (2): 142.
Lee, A. H., Stevenson, M. R., Wang, K., and Yau, K. K. (2002). Modeling young driver motor vehicle crashes: data with extra zeros. Accident Analysis & Prevention, 34 (4): 515–521.
Berhanu, G. (2004). Models relating traffic safety with road environment and traffic flows on arterial roads in addis ababa. Accident Analysis & Prevention, 36 (5): 697–704.
Lord, D., Washington, S. P., and Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, 37 (1): 35–46.
Lord, D. (2006). Modeling motor vehicle crashes using poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accident Analysis & Prevention, 38 (4): 751–766.
Lord, D., Geedipally, S. R., and Guikema, S. D. (2010). Extension of the application of conway-maxwell-poisson models: Analyzing traffic crash data exhibiting underdispersion. Risk Analysis: An International Journal, 30 (8): 1268–1276.
Winkelmann, R. and Zimmermann, K. F. (1995). Recent developments in count data modelling: theory and application. Journal of economic surveys, 9 (1): 1–24.
Oh, J., Washington, S. P., and Nam, D. (2006). Accident prediction model for railway-highway interfaces. Accident Analysis & Prevention, 38 (2): 346–356.
Hilbe, J. M. (2011). Modeling count data. Springer.
Nakagawa, T. and Osaki, S. (1975). The discrete weibull distribution. IEEE Transactions on Reliability, 24 (5): 300–301.
Roy, D. (2004). Discrete rayleigh distribution. IEEE Transactions on Reliability, 53 (2): 255–260.
Sato, H., Ikota, M., Sugimoto, A., and Masuda, H. (1999). A new defect distribution metrology with a consistent discrete exponential formula and its applications. IEEE Transactions on Semiconductor Manufacturing, 12 (4): 409–418.
Barbiero, A. (2015). Discreteweibull: Discrete weibull distributions (type 1 and 3), r package version 1.1.
Da Silva, M. F., Ferrari, S. L. P., and Cribari-Neto, F. (2008). Improved likelihood inference for the shape parameter in weibull regression. Journal of Statistical Computation and Simulation, 78 (9): 789–811.
Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5 (3): 236–244.
Gichuhi, A. W. (2008). Nonparametric changepoint analysis for bernoulli random variables based on neural networks.
Yunos, Z. M., Ali, A., Shamsyuddin, S. M., Ismail, N., et al. (2016a). Predictive modelling for motor insurance claims using artificial neural networks. Int. J. Advance Soft Compu. Appl, 8 (3).
Haghani, S., Sedehi, M., and Kheiri, S. (2017). Artificial neural network to modeling zero- inflated count data: Application to predicting number of return to blood donation. Journal of research in health sciences, 17 (3): E1–4.
Ke, J. and Liu, X. (2008). Empirical analysis of optimal hidden neurons in neural network modeling for stock prediction. In 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, volume 2, pages 828–832. IEEE.
Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.