A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence
American Journal of Data Mining and Knowledge Discovery
Volume 4, Issue 1, June 2019, Pages: 32-45
Received: May 3, 2019;
Accepted: Jun. 3, 2019;
Published: Jun. 20, 2019
Views 386 Downloads 88
Gholamreza Soleimany, Department of Industrial Engineering, Yazd University, Yazd, Iran
Masoud Abessi, Department of Industrial Engineering, Yazd University, Yazd, Iran
In this research, a new similarity measurement method that named Developed Longest Common Subsequence (DLCSS) is suggested for time series data mining. The main idea of the DLCSS is using the logic of the Longest Common Subsequence (LCSS) method and the concept of similarity in time series data. In most studies related to time series data mining, referred to the LCSS and Dynamic Time Warping (DTW) methods as the best and most usable for similarity measurement methods, but the LCSS is intrinsically designed to measure the similarity of two sequences of character, which later was developed for time series by defining and determining the similarity threshold. The value of similarity threshold has huge impact on the quality of time series data mining. In the DLCSS by defining two similarity thresholds and determining the values of them, this defect is eliminated. The performance of the DLCSS will be compared with the LCSS and DTW in time series data mining by the Query by content and K-medoids Clustering techniques on 23 datasets from the UCR datasets. The result shows that it is possible to claim that the performance of the DLCSS is better than the LCSS and DTW with 90% confidence.
A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence, American Journal of Data Mining and Knowledge Discovery.
Vol. 4, No. 1,
2019, pp. 32-45.
Morris, B. & Trivedi, M. (2009), Learning trajectory patterns by clustering: experimental studies and comparative evaluation, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09), pp. 312–319.
Fu, T. C. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24 (1), pp 164-181.
Keogh, E. & Kasetty, S. (2003). on the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery, 7 (4), pp 349–371.
Sangeeta, R. & Geeta, S. (2012). Recent Techniques of Clustering of Time Series Data: A Survey. International Journal of Computer Applications, 52 (15), pp 1-9.
Lin, J. Vlachos, M. Keogh, E. & Gunopulos, D. (2004). Iterative Incremental Clustering of Time Series. International.
Liao, T. W. (2005). Clustering of time series data: a survey. Pattern Recognition, 38 (11), pp 1857-1874. Conference on Extending Database Technology, Advances in Database Technology- EDBT 2004, pp. 106-122.
Lin, J. Keogh, E. Lonardi, S. & Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. DMKD '03 Proceedings of the 8th ACM SIGMOD Workshop on Research issues in data mining and knowledge discovery, pp 2-11.
Aghabozorgi, S. Seyed Shirkhorshidi, A. & Wah, T. Y. (2015). Time-series clustering- A decade review. Information Systems, 53, pp 16-38.
Aghabozorgi, S. Wah, T. Y. Herawan, T. Jalab, H. Shaygan, M. A. & Jalali, A. R. (2014). A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique. The Scientific World Journal, 2014, p562194.
Chen, L. & Ng, R. (2004). On the marriage of Lp-norms and edit distance. VLDB '04 Proceedings of the Thirtieth international conference on very large data bases, 30, pp 792-803.
Esling, P. & Agon C. (2012). Time-Series Data Mining. ACM Computing Surveys, 45 (1), pp. 1-34.
Yi, B. K. & Faloutsos, C. (2000). Fast Time Sequence Indexing for Arbitrary Lp Norms. VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases, pp 385-394.
Moller-Levet, C. S. Klawonn, F. Cho, K-H. & Wolkenhauer, O. (2003). Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points. International Symposium on Intelligent Data Analysis, Advances in Intelligent Data Analysis V, pp 330-340.
Berndt, D. J. & Clifford, J. (1994). Using Dynamic Time Warping to find patterns in time series. AAAIWS'94 Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp 359-370.
Levenshtein, V. I. (1965). Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR, 163 (4), pp 845–848.
Vlachos, M. Gunopulos, D. & Kollios, G. (2002). Discovering similar multidimensional trajectories. Proceedings 18th International Conference on Data Engineering, pp 673-684.
Chen, L. Ozsu, M. T. & Oria, V. (2005). Robust and fast similarity search for moving object trajectories. SIGMOD '05 Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 491-502
Vlachos, M. & Gunopulos, D. (2004). Indexing time series under condition of noise. Data mining in time series database: Series in machine perception and artificial intelligence- World Scientific Publishing, 57, pp 67-100.
Vasimalla, K. (2014). A Survey on Tim Series Data Mining. International Journal of Innovative Research in Computer and Communication Engineering, 2 (5), pp 170-179.
Gorbenko, A. & Popov, V. (2012). The Longest Common Subsequence Problem. Advanced Studies in Biology, 4 (8), pp 373-380.
Zhang, Z. Huang, K. & Tan, T. (2006). Comparison of Similarity Measures for Trajectory Clustering in Outdoor Surveillance Scenes. 18th International Conference on Pattern Recognition, 3, pp 1135-1138.
Grabusts, P. & Borisov, A. (2009). Clustering Methodology for Time Sesies Mining. Scientific Journal of RIGA Technical University, computer science, Information technology and management science, 40 (1), pp 81-86.
Ozkan, I. & Turksen, B. (2015). Fuzzy Longest Common Subsequence Matching with FCM. ArXiv.
Gorecki, T. (2014). Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recognition Letters, 45 (1), pp. 99–105.
Aghabozorgi, S. & Wah, T. Y. (2014). Effective Clustering of Time-Series Data Using FCM. International Journal of Machine Learning and Computing, 4 (2), pp 170-176.
Lines, J. & Bagnall, A. (2015). Time series classification with ensembles of elastic distance measures. Data Mining Knowledge Discovery, 29 (3), pp 565–592.
Tsai, Y. T. (2003). The constrained longest common subsequence problem. Information Processing Letters, 88 (4), pp 173–176.
Sankoff, D. (1972). Matching Sequences Under Deletion. Insertion Constraints. Proceeding National Academy of Sciences, 69 (1), pp 4-6.
Smith, T. F. & Waterman, M. S. (1981). Identification of Common Molecular Subsequences. Journal of Molecular Biology, 147 (1), pp 195-197.
Amihood, A. Gotthilf, Z. & Shalom, B. R. (2010). Weighted LCS. Journal of Discrete Algorithms, 8 (3), pp 273–281.
Guoa, Y.-P. Pengb, Y.-H. & Yanga, C.-B. (2013). Efficient Algorithms for the Flexible Longest Common Subsequence Problem with sequential sub-string constraints. Journal of Complexity, 29, pp. 44–52.
Cheng, k-Y. Huang, K-S. Yanga, C.-B. & Ann, H-Y. (2013). The Longest Common Subsequence Problem with the Gapped Constriant. The 30th Workshop on Combinatorial Mathematics and Computation Theory, pp 37-42.