International Journal of Statistical Distributions and Applications
Volume 3, Issue 4, December 2017, Pages: 72-80
Received: Mar. 5, 2017;
Accepted: Mar. 28, 2017;
Published: Nov. 10, 2017
Views 2235 Downloads 132
Ahmed Mahmoud Gad, Statistics Department, Faculty of Economics and Political Science, Cairo University, Cairo, Egypt
Rania Hassan Mohamed Abdelkhalek, Department Statistics, Mathematics and Insurance, Faculty of Commerce, Benha University, Benha, Egypt
Longitudinal studies play an important role in scientific researches. The defining characteristic of the longitudinal studies is that observations are collected from each subject repeatedly over time, or under different conditions. Missing values are common in longitudinal studies. The presence of missing values is always a fundamental challenge since it produces potential bias, even in well controlled conditions. Three different missing data mechanisms are defined; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Several imputation methods have been developed in literature to handle missing values in longitudinal data. The most commonly used imputation methods include complete case analysis (CCA), mean imputation (Mean), last observation carried forward (LOCF), hot deck (HOT), regression imputation (Regress), K-nearest neighbor (KNN), The expectation maximization (EM) algorithm, and multiple imputation (MI). In this article, a comparative study is conducted to investigate the efficiency of these eight imputation methods under different missing data mechanisms. The comparison is conducted through simulation study. It is concluded that the MI method is the most effective method as it has the least standard errors. The EM algorithm has the largest relative bias. The different methods are also compared via real data application.
Ahmed Mahmoud Gad,
Rania Hassan Mohamed Abdelkhalek,
Imputation Methods for Longitudinal Data: A Comparative Study, International Journal of Statistical Distributions and Applications.
Vol. 3, No. 4,
2017, pp. 72-80.
Allison, P. D. (2002) Missing data, quantitative applications in the social sciences, SAGE University Papers.
Blankers, M., Koeter, M. W. J., and Schippers, G. M. (2010) Missing data approaches in e health research: simulation study and a tutorial for non-mathematically inclined researchers, Journal of Medical Internet Research, 12, 5: e54.
Chen J, Shao J. (2000) Nearest neighbor imputation for survey data, Journal of Official Statistics, 16, 113–141.
Dempster, A. P., Laird, M. N., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, B39, 1-38.
Dragset, I. G. (2009) Analysis of longitudinal data with missing values, MSc. Thesis, Department of Mathematical Sciences, Norwegian University of Science and Technology.
Engel, J. M. and Diehr, P. (2003) Imputation of missing longitudinal data: a comparison of methods, Journal of Clinical Epidemiology, 56, 968-976.
Fichman, M. and Cummings, J. M. (2003) Multiple Imputation for Missing Data: Making the Most of What you Know, Organizational Research Methods, 6, 282-308.
Gad, A. M. and Ahmed, A. S. (2006) Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm, Computational Statistics & Data Analysis, 50, 2702 – 2714
Hürny, C., Bernhard, J., Gelber, R. D., Coates, A., Gastiglione, M., Isley, M., Dreher, D., peterson, H., Goldhirsch, A. and Senn, H. J. (1992) Quality of life measures for patients receiving adjutant therapy for breast cancer: an international trial, European J. Cancer, 28, 118–124.
Ibrahim, J. G., Chen, M. H. and Lipsitz, S. R. (2001) Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable, Biometrika, 88, 551–564.
Lane, P. (2008) Handling drop-out in longitudinal clinical trial: a comparison of the LOCF and MMRM approaches, Pharmaceutical Statistics, 7, 93-106.
Little, R. J. A and Rubin, D. B. (2002) Statistical analysis with missing data, 2nd edition, Wiley, US.
Madow W. G., Nisselson, H. and Olkin, I. (1983) Incomplete data in sample surveys, report and case studies, 1, Academic Press, New York.
Mishra, S., and Khare, D. (2014) On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study, Journal of Medical Statistics and Informatics, 2, 7662-7669.
Nakai, M. (2011) Simulation study: Introduction of imputation methods for missing data in longitudinal analysis, Applied Mathematical Sciences, 57, 2807-2818.
Nakai, M. (2012) Effectiveness of Imputation Methods for Missing Data in AR (1) Longitudinal Dataset, Int. Journal of Math. Analysis, 6, 1391 – 1394.
Nakai, M., Chen, D. G., Nishimura, K., Miyamoto, Y. (2014) Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism, Open Journal of Statistics, 4, 27-37.
Nakai, M., and Ke, W. (2011) Review of the Methods for Handling Missing Data in Longitudinal Data Analysis, International Journal of Mathematical Analysis, 5, 1-13.
Newman, D. (2003) Longitudinal modeling with randomly systematically missing data: A simulation of ad hoc, maximum likelihood, and multiple imputation techniques, Organizational Research Methods, 6, 328-362.
Rancourt, E., Särndal, C. and Lee, H. (1994) Estimation of the variance in the presence of nearest neighbor imputation, Survey Research Methods Proceedings, 888-893.
Rubin, D. B. (1987) Multiple Imputation for Nonresponse in Surveys, Wiley, New York.
Saha, C., Jones, M. B. (2009) Bias in the last observation carried forward method under informative dropout, Journal of Statistical Planning and Inference, 139, 246 -255.
Saunders, J. A., Morrow-Howell, N., Spitznagel, E., Dork, P., Proctor, E. K., and Pescarino, R. (2006) Imputing missing data: a comparison of methods for social work researchers, National Association of Social Workers, 30, 19-31.
Shieh, Y. Y. (2003) Imputation methods on general linear mixed models of longitudinal studies, Committee on Statistical Methodology Conference Papers.
Streiner, D. L. (2002) The case of the missing data: Methods of dealing with dropouts and other research vagaries, Canadian Journal of Psychiatry, 47, 68-75.
Troxel, A. B., Harrington, D. P., Lipsitz, S. R. (1998) Analysis of longitudinal data with non-ignorable non monotone missing values. Appl. Statist, 47, 425–438.
Van der Heijden, J. M. G., Donders, R. T., Stijnen, T., and Moons, K. G. M. (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostics research: A clinical example, Journal of Clinical Epidemiology, 59, 1102-1109.
Zhu, X. (2015) Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study, Open Journal of Statistics, 4, 933-944.