Journal of Electrical and Electronic Engineering
Volume 4, Issue 3, June 2016, Pages: 51-56
Received: May 21, 2016;
Published: May 24, 2016
Views 3004 Downloads 88
Song Yaqi, Department of Computer Science, North China Electric Power University, Baoding, China
Performance is the key issue in power big data applications. One of main challenges is how to exploit these technologies in building power big data processing platform and facilitating science discoveries such as those in electric power systems. This paper explores how Spark and Cloud computing can accelerate performance of missive insulator leak current data pattern recognition. We have designed and implemented the Parallel KNN(k-NearestNeighbor) algorithm using Spark and then deployed onto the Aliyun E-MapReduce cloud computing platform. The results from experiments shows the performance and scalability can be enhanced through these advanced technologies.
Fast Type Recognition of Missive Insulator Leakage Current Data Using Spark, Journal of Electrical and Electronic Engineering.
Vol. 4, No. 3,
2016, pp. 51-56.
Zhou, G., Zhu, Y., Wang, G., & Song, Y. “Real-time big data processing technology application in the field of state monitoring”. Diangong Jishu Xuebao/transactions of China Electrotechnical Society, vol.29, pp. 432-437.
Uri Hasson, Jeremy I Skipper, Michael J Wilde, Howard C Nusbaum, and Steven L Small. Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. NeuroImage, 39(2):693–706, 2008.
Kai-Wei Chang, Deka, B Hwu, W.-M.W, etc. Efficient Pattern-based Time Series Classification on GPU[C]. 2012 IEEE 12th International Conference on Data Mining (ICDM 2012). Los Alamitos, CA, USA, 2012: 131-40.
Zhao Jun, Zhu Xiaoliang, Wang Wei, etc. Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration [J]. Neurocomputing, 2013, 118: 215-224.
Haimonti Dutta, Alex Kamil, Manoj Pooleery, Simha Sethumadhavan, and John Demme. Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase [J]. Grid and Cloud Database Management.2011.9.
White T. Hadoop: The definitive guide [M]. O'Reilly Media, Inc, 2012:260-261.
Christophe Bisciglia. The smart grid: Hadoop at the Tennessee Valley Authority (TVA) [EB/OL]. 2009.6 [2013.2]. http://www.cloudera.com/blog/2009/06/smart-grid-hadoop-tennessee-valley-authority-tva/
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing [A]. Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation[C]. USENIX Association, 2012: 2-2
Zaharia M, Chowdhury M, Das T, et al. Fast and interactive analytics over Hadoop data with Spark[J]. USENIX; login, 2012, 37(4): 45-51
Yan Y, Huang L, Yi L. Is Apache Spark scalable to seismic data analytics and computations? [C]// IEEE International Conference on Big Data. IEEE, 2015.
Shyam R, Bharathi Ganesh H. B, Sachin Kumar S, et al. Apache Spark a Big Data Analytics Platform for Smart Grid[J]. Procedia Technology, 2015, 21:171-178.
Mushtaq H, Al-Ars Z. Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline[C]// Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on. IEEE, 2015:1471-1477.
Ram, Rez-Gallego S, Garc, et al. Distributed Entropy Minimization Discretizer for Big Data Analysis under Apache Spark [C]// IEEE Trustcom/bigdatase/ispa. IEEE Computer Society, 2015.
Cover, T., Hart, P. Nearest neighbor pattern classification [J]. IEEETrans. Inf. Theory, 1967, 30(1): 21–27
Suda T. Frequency characteristics of leakage current waveforms of an artificially polluted suspension insulator [J]. Dielectrics & Electrical Insulation IEEE Transactions on, 2001, 8(4): 705-709.