Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis
International Journal of Data Science and Analysis
Volume 6, Issue 1, February 2020, Pages: 1-11
Received: Nov. 12, 2019; Accepted: Dec. 6, 2019; Published: Jan. 8, 2020
Views 64      Downloads 43
Benard Mworia Warutumo, Department of Statistics and Actuarial Science, Technical University of Kenya, Nairobi, Kenya
Pius Nderitu Kihara, Department of Statistics and Actuarial Science, Technical University of Kenya, Nairobi, Kenya
Levi Mbugua, Department of Statistics and Actuarial Science, Technical University of Kenya, Nairobi, Kenya
Article Tools
Follow on us
The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya.
Clean Data, Missing Data, Imputation, Petroleum Total Demand
To cite this article
Benard Mworia Warutumo, Pius Nderitu Kihara, Levi Mbugua, Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis, International Journal of Data Science and Analysis. Vol. 6, No. 1, 2020, pp. 1-11. doi: 10.11648/j.ijdsa.20200601.11
Copyright © 2020 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Christian Fürber (2016) Data Quality Management with Semantic Technologies Springer Gabler.
Roderick J. A, Donald B. Rubin (2002) Statistical Analysis with Missing Data, Wiley-Interscience.
Wanishsakpong, W., & Notodiputro, K. A. (2018). Locally weighted scatter‐plot smoothing for analysing temperature changes and patterns in A ustralia. Meteorological Applications, 25 (3), 357-364.
Jack E. Olson (2003) Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems) 1st Edition Morgan Kaufmann.
Alexandra, A., Megan, D., Elizabeth, D. and Shivani, M. (2015). City-Level Energy Decision Making: Data Use in Energy Planning, Implementation, and Evaluation in U.S. Cities NREL is a national laboratory of the U.S. Department of Energy Office of Energy Efficiency & Renewable EnergyOperated by the Alliance for Sustainable Energy, LLCT report.
Kihara, P. N. (2013). Estimation of Finite Population Total in the Face of Missing Values Using Model Calibration and Model Assistance on Semiparametric and Nonparametric Models. PhD thesis.
Rüeger, S., McDaid, A., & Kutalik, Z. (2018). Improved imputation of summary statistics for admixed populations. bioRxiv, 203927.
Mbugua, L. (2014). Modeling energy demand using nonparametric and extreme value theory. Lambert Academic Publishing.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley Series in Probability and Statistics ISBN: 9780470316696 |DOI:10.1002/9780470316696.
Schafer, J. L. (1999) Multiple Imputation: A Primer. Statistical MethodsinMedicalResearch, 8, 3-15.
Schafer, J. L and John W. G. (2002) Missing Data: Our View of the State of the Art. Psychological Methods. The American Psychological Association, 7 (2), 147–177
Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association, 87, 998-1004.
Fan, J and Gijbels, I (2003). Local polynomial modeling and its application. Chapman and Hall.
Ruppert, D and Wand, M. P (1994). Multivariate weighted least squares regression. Ann. Statist. 22, 1346–70.
Horvitz, D., and Thompson, D. (1952) A generalization of sampling without replacement from a finite universe. Journal of American Statistical Association, 47:663-685.
Breidt, F. J., Opsomer, J. D., Johnson, A. A., and Ranalli, M. G. (2007). Semiparametric model-assisted estimation for natural resource surveys. Survey Methodology, 33 (1), 35.
Cochran, W. G. (1977). Sampling techniques-3. New York, NY (USA) Wiley.
Pyeye, S. (2018). Imputation Based On Local Polynomial Regression for Nonmonotone Nonrespondents in Longitudinal Surveys (Doctoral dissertation, JKUAT-PAUSTI).
Fritz, M. (2019). Steady state adjusting trends using a data-driven local polynomial regression. Economic Modelling.
Cattaneo, M. D., Jansson, M., & Ma, X. (2019). Simple local polynomial density estimators. Journal of the American Statistical Association, (just-accepted), 1-11.
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
Tel: (001)347-983-5186