International Journal of Data Science and Analysis

| Peer-Reviewed |

Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis

Received: 12 November 2019    Accepted: 06 December 2019    Published: 08 January 2020
Views:       Downloads:

Share This Article

Abstract

The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya.

DOI 10.11648/j.ijdsa.20200601.11
Published in International Journal of Data Science and Analysis (Volume 6, Issue 1, February 2020)
Page(s) 1-11
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Clean Data, Missing Data, Imputation, Petroleum Total Demand

References
[1] Christian Fürber (2016) Data Quality Management with Semantic Technologies Springer Gabler.
[2] Roderick J. A, Donald B. Rubin (2002) Statistical Analysis with Missing Data, Wiley-Interscience.
[3] Wanishsakpong, W., & Notodiputro, K. A. (2018). Locally weighted scatter‐plot smoothing for analysing temperature changes and patterns in A ustralia. Meteorological Applications, 25 (3), 357-364.
[4] Jack E. Olson (2003) Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems) 1st Edition Morgan Kaufmann.
[5] Alexandra, A., Megan, D., Elizabeth, D. and Shivani, M. (2015). City-Level Energy Decision Making: Data Use in Energy Planning, Implementation, and Evaluation in U.S. Cities NREL is a national laboratory of the U.S. Department of Energy Office of Energy Efficiency & Renewable EnergyOperated by the Alliance for Sustainable Energy, LLCT report.
[6] Kihara, P. N. (2013). Estimation of Finite Population Total in the Face of Missing Values Using Model Calibration and Model Assistance on Semiparametric and Nonparametric Models. PhD thesis.
[7] Rüeger, S., McDaid, A., & Kutalik, Z. (2018). Improved imputation of summary statistics for admixed populations. bioRxiv, 203927.
[8] Mbugua, L. (2014). Modeling energy demand using nonparametric and extreme value theory. Lambert Academic Publishing.
[9] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley Series in Probability and Statistics ISBN: 9780470316696 |DOI:10.1002/9780470316696.
[10] Schafer, J. L. (1999) Multiple Imputation: A Primer. Statistical MethodsinMedicalResearch, 8, 3-15. http://dx.doi.org/10.1191/096228099671525676.
[11] Schafer, J. L and John W. G. (2002) Missing Data: Our View of the State of the Art. Psychological Methods. The American Psychological Association, 7 (2), 147–177
[12] Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association, 87, 998-1004.
[13] Fan, J and Gijbels, I (2003). Local polynomial modeling and its application. Chapman and Hall.
[14] Ruppert, D and Wand, M. P (1994). Multivariate weighted least squares regression. Ann. Statist. 22, 1346–70.
[15] Horvitz, D., and Thompson, D. (1952) A generalization of sampling without replacement from a finite universe. Journal of American Statistical Association, 47:663-685.
[16] Breidt, F. J., Opsomer, J. D., Johnson, A. A., and Ranalli, M. G. (2007). Semiparametric model-assisted estimation for natural resource surveys. Survey Methodology, 33 (1), 35.
[17] Cochran, W. G. (1977). Sampling techniques-3. New York, NY (USA) Wiley.
[18] Pyeye, S. (2018). Imputation Based On Local Polynomial Regression for Nonmonotone Nonrespondents in Longitudinal Surveys (Doctoral dissertation, JKUAT-PAUSTI).
[19] Fritz, M. (2019). Steady state adjusting trends using a data-driven local polynomial regression. Economic Modelling.
[20] Cattaneo, M. D., Jansson, M., & Ma, X. (2019). Simple local polynomial density estimators. Journal of the American Statistical Association, (just-accepted), 1-11.
Author Information
  • Department of Statistics and Actuarial Science, Technical University of Kenya, Nairobi, Kenya

  • Department of Statistics and Actuarial Science, Technical University of Kenya, Nairobi, Kenya

  • Department of Statistics and Actuarial Science, Technical University of Kenya, Nairobi, Kenya

Cite This Article
  • APA Style

    Benard Mworia Warutumo, Pius Nderitu Kihara, Levi Mbugua. (2020). Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis. International Journal of Data Science and Analysis, 6(1), 1-11. https://doi.org/10.11648/j.ijdsa.20200601.11

    Copy | Download

    ACS Style

    Benard Mworia Warutumo; Pius Nderitu Kihara; Levi Mbugua. Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis. Int. J. Data Sci. Anal. 2020, 6(1), 1-11. doi: 10.11648/j.ijdsa.20200601.11

    Copy | Download

    AMA Style

    Benard Mworia Warutumo, Pius Nderitu Kihara, Levi Mbugua. Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis. Int J Data Sci Anal. 2020;6(1):1-11. doi: 10.11648/j.ijdsa.20200601.11

    Copy | Download

  • @article{10.11648/j.ijdsa.20200601.11,
      author = {Benard Mworia Warutumo and Pius Nderitu Kihara and Levi Mbugua},
      title = {Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis},
      journal = {International Journal of Data Science and Analysis},
      volume = {6},
      number = {1},
      pages = {1-11},
      doi = {10.11648/j.ijdsa.20200601.11},
      url = {https://doi.org/10.11648/j.ijdsa.20200601.11},
      eprint = {https://download.sciencepg.com/pdf/10.11648.j.ijdsa.20200601.11},
      abstract = {The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya.},
     year = {2020}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis
    AU  - Benard Mworia Warutumo
    AU  - Pius Nderitu Kihara
    AU  - Levi Mbugua
    Y1  - 2020/01/08
    PY  - 2020
    N1  - https://doi.org/10.11648/j.ijdsa.20200601.11
    DO  - 10.11648/j.ijdsa.20200601.11
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 1
    EP  - 11
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20200601.11
    AB  - The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya.
    VL  - 6
    IS  - 1
    ER  - 

    Copy | Download

  • Sections