Send your promotional news to: email@example.com
More understanding of models for count data is of importance and especially if the data has significant zeros, that is, zeros that cannot just be ignored as they give important information.
In a recent paper by authors Kasyoki, Ngesa & Waititu, the performance of different count data models under data sets with different proportions of zeros was investigated. Different count data may possess different characteristics and therefore cannot be used with particular count data models. One of the common assumptions has been that all count data follows Poisson distribution and therefore the mean and the variance are equal. However, this is not the case as the data may show some deviation from this assumption. Another case is whereby particular count data models can handle data with a particular amount of zeros. In some cases these zeros cannot be ignored because they are of great importance as they are meaningful. The authors applied a simulation technique whereby the Akaike Information Criterion (AIC) was used to compare the models goodness of fit to the simulated data sets.
Poisson, negative binomial and Hurdle models were compared under the different data sets. Particularly data sets with 0.05, 0.10, 0.25, 0.50, 0.75 and 0.90 proportions of zero were simulated. The different data sets were found to be over-dispersed in the sense that the variance was greater than the mean. This differed from the common assumption of the Poisson model that the mean and the variance should be equal.
AIC is usually interpreted in the “lower is better” fashion. The results showed that Poisson model performed poorly for the over-dispersed count data sets as it had higher AICs. Negative binomial performed better in the over-dispersed count data sets which had approximately below 0.3 proportion of zero. Hurdle model fitted better in data sets with 0.3 and above proportion of zero.
Kasyoki, Ngesa & Waititu therefore suggested that in modeling any count data, the researchers and other practitioners should always consider the zero proportion. The authors also suggested that if the count data is over-dispersed and the proportion of zero counts in it is below 30% then negative binomial model should be used, otherwise for higher percentages or proportions of zeros, hurdle model should be used. The results of the above graph can therefore be used as a benchmark for choosing the best model for handling over-dispersed count data with different proportions of zeros.
Alexander Kasyoki Muoka, Anthony Gichuhi Waititu: Department of Basic and Applied Sciences, Jomo Kenyatta University of Agriculture and technology-Westlands campus, Kenya. Oscar Owino Ngesa: Taita Taveta University College, Kenya.
A paper about the study appeared recently in the Science Journal of Applied Mathematics and Statistics.