A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence

Gholamreza Soleimany; Masoud Abessi

doi:doi:10.11648/j.ajdmkd.20190401.16

| Peer-Reviewed

A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence

Gholamreza Soleimany, Masoud Abessi

Published in American Journal of Data Mining and Knowledge Discovery (Volume 4, Issue 1)

Received: 3 May 2019 Accepted: 3 June 2019 Published: 20 June 2019

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

In this research, a new similarity measurement method that named Developed Longest Common Subsequence (DLCSS) is suggested for time series data mining. The main idea of the DLCSS is using the logic of the Longest Common Subsequence (LCSS) method and the concept of similarity in time series data. In most studies related to time series data mining, referred to the LCSS and Dynamic Time Warping (DTW) methods as the best and most usable for similarity measurement methods, but the LCSS is intrinsically designed to measure the similarity of two sequences of character, which later was developed for time series by defining and determining the similarity threshold. The value of similarity threshold has huge impact on the quality of time series data mining. In the DLCSS by defining two similarity thresholds and determining the values of them, this defect is eliminated. The performance of the DLCSS will be compared with the LCSS and DTW in time series data mining by the Query by content and K-medoids Clustering techniques on 23 datasets from the UCR datasets. The result shows that it is possible to claim that the performance of the DLCSS is better than the LCSS and DTW with 90% confidence.

Published in	American Journal of Data Mining and Knowledge Discovery (Volume 4, Issue 1)
DOI	10.11648/j.ajdmkd.20190401.16
Page(s)	32-45
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2019. Published by Science Publishing Group

Keywords

Time Series, Data Mining, Similarity Measurement, Longest Common Subsequence, Dynamic Time Warping, Developed Longest Common Subsequence

References

[1]	Morris, B. & Trivedi, M. (2009), Learning trajectory patterns by clustering: experimental studies and comparative evaluation, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09), pp. 312–319.
[2]	Fu, T. C. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24 (1), pp 164-181.
[3]	Keogh, E. & Kasetty, S. (2003). on the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery, 7 (4), pp 349–371.
[4]	Sangeeta, R. & Geeta, S. (2012). Recent Techniques of Clustering of Time Series Data: A Survey. International Journal of Computer Applications, 52 (15), pp 1-9.
[5]	Lin, J. Vlachos, M. Keogh, E. & Gunopulos, D. (2004). Iterative Incremental Clustering of Time Series. International.
[6]	Liao, T. W. (2005). Clustering of time series data: a survey. Pattern Recognition, 38 (11), pp 1857-1874. Conference on Extending Database Technology, Advances in Database Technology- EDBT 2004, pp. 106-122.
[7]	Lin, J. Keogh, E. Lonardi, S. & Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. DMKD '03 Proceedings of the 8th ACM SIGMOD Workshop on Research issues in data mining and knowledge discovery, pp 2-11.
[8]	Aghabozorgi, S. Seyed Shirkhorshidi, A. & Wah, T. Y. (2015). Time-series clustering- A decade review. Information Systems, 53, pp 16-38.
[9]	Aghabozorgi, S. Wah, T. Y. Herawan, T. Jalab, H. Shaygan, M. A. & Jalali, A. R. (2014). A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique. The Scientific World Journal, 2014, p562194.
[10]	Chen, L. & Ng, R. (2004). On the marriage of Lp-norms and edit distance. VLDB '04 Proceedings of the Thirtieth international conference on very large data bases, 30, pp 792-803.
[11]	Esling, P. & Agon C. (2012). Time-Series Data Mining. ACM Computing Surveys, 45 (1), pp. 1-34.
[12]	Yi, B. K. & Faloutsos, C. (2000). Fast Time Sequence Indexing for Arbitrary Lp Norms. VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases, pp 385-394.
[13]	Moller-Levet, C. S. Klawonn, F. Cho, K-H. & Wolkenhauer, O. (2003). Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points. International Symposium on Intelligent Data Analysis, Advances in Intelligent Data Analysis V, pp 330-340.
[14]	Berndt, D. J. & Clifford, J. (1994). Using Dynamic Time Warping to find patterns in time series. AAAIWS'94 Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp 359-370.
[15]	Levenshtein, V. I. (1965). Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR, 163 (4), pp 845–848.
[16]	Vlachos, M. Gunopulos, D. & Kollios, G. (2002). Discovering similar multidimensional trajectories. Proceedings 18th International Conference on Data Engineering, pp 673-684.
[17]	Chen, L. Ozsu, M. T. & Oria, V. (2005). Robust and fast similarity search for moving object trajectories. SIGMOD '05 Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 491-502
[18]	Vlachos, M. & Gunopulos, D. (2004). Indexing time series under condition of noise. Data mining in time series database: Series in machine perception and artificial intelligence- World Scientific Publishing, 57, pp 67-100.
[19]	Vasimalla, K. (2014). A Survey on Tim Series Data Mining. International Journal of Innovative Research in Computer and Communication Engineering, 2 (5), pp 170-179.
[20]	Gorbenko, A. & Popov, V. (2012). The Longest Common Subsequence Problem. Advanced Studies in Biology, 4 (8), pp 373-380.
[21]	Zhang, Z. Huang, K. & Tan, T. (2006). Comparison of Similarity Measures for Trajectory Clustering in Outdoor Surveillance Scenes. 18th International Conference on Pattern Recognition, 3, pp 1135-1138.
[22]	Grabusts, P. & Borisov, A. (2009). Clustering Methodology for Time Sesies Mining. Scientific Journal of RIGA Technical University, computer science, Information technology and management science, 40 (1), pp 81-86.
[23]	Ozkan, I. & Turksen, B. (2015). Fuzzy Longest Common Subsequence Matching with FCM. ArXiv.
[24]	Gorecki, T. (2014). Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recognition Letters, 45 (1), pp. 99–105.
[25]	Aghabozorgi, S. & Wah, T. Y. (2014). Effective Clustering of Time-Series Data Using FCM. International Journal of Machine Learning and Computing, 4 (2), pp 170-176.
[26]	Lines, J. & Bagnall, A. (2015). Time series classification with ensembles of elastic distance measures. Data Mining Knowledge Discovery, 29 (3), pp 565–592.
[27]	Tsai, Y. T. (2003). The constrained longest common subsequence problem. Information Processing Letters, 88 (4), pp 173–176.
[28]	Sankoff, D. (1972). Matching Sequences Under Deletion. Insertion Constraints. Proceeding National Academy of Sciences, 69 (1), pp 4-6.
[29]	Smith, T. F. & Waterman, M. S. (1981). Identification of Common Molecular Subsequences. Journal of Molecular Biology, 147 (1), pp 195-197.
[30]	Amihood, A. Gotthilf, Z. & Shalom, B. R. (2010). Weighted LCS. Journal of Discrete Algorithms, 8 (3), pp 273–281.
[31]	Guoa, Y.-P. Pengb, Y.-H. & Yanga, C.-B. (2013). Efficient Algorithms for the Flexible Longest Common Subsequence Problem with sequential sub-string constraints. Journal of Complexity, 29, pp. 44–52.
[32]	Cheng, k-Y. Huang, K-S. Yanga, C.-B. & Ann, H-Y. (2013). The Longest Common Subsequence Problem with the Gapped Constriant. The 30th Workshop on Combinatorial Mathematics and Computation Theory, pp 37-42.

Cite This Article

Plain Text BibTeX RIS

APA Style

Gholamreza Soleimany, Masoud Abessi. (2019). A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence. American Journal of Data Mining and Knowledge Discovery, 4(1), 32-45. https://doi.org/10.11648/j.ajdmkd.20190401.16

Copy | Download

ACS Style

Gholamreza Soleimany; Masoud Abessi. A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence. Am. J. Data Min. Knowl. Discov. 2019, 4(1), 32-45. doi: 10.11648/j.ajdmkd.20190401.16

Copy | Download

AMA Style

Gholamreza Soleimany, Masoud Abessi. A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence. Am J Data Min Knowl Discov. 2019;4(1):32-45. doi: 10.11648/j.ajdmkd.20190401.16

Copy | Download

@article{10.11648/j.ajdmkd.20190401.16,
  author = {Gholamreza Soleimany and Masoud Abessi},
  title = {A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence},
  journal = {American Journal of Data Mining and Knowledge Discovery},
  volume = {4},
  number = {1},
  pages = {32-45},
  doi = {10.11648/j.ajdmkd.20190401.16},
  url = {https://doi.org/10.11648/j.ajdmkd.20190401.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20190401.16},
  abstract = {In this research, a new similarity measurement method that named Developed Longest Common Subsequence (DLCSS) is suggested for time series data mining. The main idea of the DLCSS is using the logic of the Longest Common Subsequence (LCSS) method and the concept of similarity in time series data. In most studies related to time series data mining, referred to the LCSS and Dynamic Time Warping (DTW) methods as the best and most usable for similarity measurement methods, but the LCSS is intrinsically designed to measure the similarity of two sequences of character, which later was developed for time series by defining and determining the similarity threshold. The value of similarity threshold has huge impact on the quality of time series data mining. In the DLCSS by defining two similarity thresholds and determining the values of them, this defect is eliminated. The performance of the DLCSS will be compared with the LCSS and DTW in time series data mining by the Query by content and K-medoids Clustering techniques on 23 datasets from the UCR datasets. The result shows that it is possible to claim that the performance of the DLCSS is better than the LCSS and DTW with 90% confidence.},
 year = {2019}
}

Copy | Download

TY  - JOUR
T1  - A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence
AU  - Gholamreza Soleimany
AU  - Masoud Abessi
Y1  - 2019/06/20
PY  - 2019
N1  - https://doi.org/10.11648/j.ajdmkd.20190401.16
DO  - 10.11648/j.ajdmkd.20190401.16
T2  - American Journal of Data Mining and Knowledge Discovery
JF  - American Journal of Data Mining and Knowledge Discovery
JO  - American Journal of Data Mining and Knowledge Discovery
SP  - 32
EP  - 45
PB  - Science Publishing Group
SN  - 2578-7837
UR  - https://doi.org/10.11648/j.ajdmkd.20190401.16
AB  - In this research, a new similarity measurement method that named Developed Longest Common Subsequence (DLCSS) is suggested for time series data mining. The main idea of the DLCSS is using the logic of the Longest Common Subsequence (LCSS) method and the concept of similarity in time series data. In most studies related to time series data mining, referred to the LCSS and Dynamic Time Warping (DTW) methods as the best and most usable for similarity measurement methods, but the LCSS is intrinsically designed to measure the similarity of two sequences of character, which later was developed for time series by defining and determining the similarity threshold. The value of similarity threshold has huge impact on the quality of time series data mining. In the DLCSS by defining two similarity thresholds and determining the values of them, this defect is eliminated. The performance of the DLCSS will be compared with the LCSS and DTW in time series data mining by the Query by content and K-medoids Clustering techniques on 23 datasets from the UCR datasets. The result shows that it is possible to claim that the performance of the DLCSS is better than the LCSS and DTW with 90% confidence.
VL  - 4
IS  - 1
ER  -

Copy | Download

Author Information

Gholamreza Soleimany

Department of Industrial Engineering, Yazd University, Yazd, Iran
Masoud Abessi

Department of Industrial Engineering, Yazd University, Yazd, Iran

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Gholamreza Soleimany, Masoud Abessi. (2019). A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence. American Journal of Data Mining and Knowledge Discovery, 4(1), 32-45. https://doi.org/10.11648/j.ajdmkd.20190401.16

Copy | Download

ACS Style

Gholamreza Soleimany; Masoud Abessi. A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence. Am. J. Data Min. Knowl. Discov. 2019, 4(1), 32-45. doi: 10.11648/j.ajdmkd.20190401.16

Copy | Download

AMA Style

Gholamreza Soleimany, Masoud Abessi. A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence. Am J Data Min Knowl Discov. 2019;4(1):32-45. doi: 10.11648/j.ajdmkd.20190401.16

Copy | Download

@article{10.11648/j.ajdmkd.20190401.16,
  author = {Gholamreza Soleimany and Masoud Abessi},
  title = {A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence},
  journal = {American Journal of Data Mining and Knowledge Discovery},
  volume = {4},
  number = {1},
  pages = {32-45},
  doi = {10.11648/j.ajdmkd.20190401.16},
  url = {https://doi.org/10.11648/j.ajdmkd.20190401.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20190401.16},
  abstract = {In this research, a new similarity measurement method that named Developed Longest Common Subsequence (DLCSS) is suggested for time series data mining. The main idea of the DLCSS is using the logic of the Longest Common Subsequence (LCSS) method and the concept of similarity in time series data. In most studies related to time series data mining, referred to the LCSS and Dynamic Time Warping (DTW) methods as the best and most usable for similarity measurement methods, but the LCSS is intrinsically designed to measure the similarity of two sequences of character, which later was developed for time series by defining and determining the similarity threshold. The value of similarity threshold has huge impact on the quality of time series data mining. In the DLCSS by defining two similarity thresholds and determining the values of them, this defect is eliminated. The performance of the DLCSS will be compared with the LCSS and DTW in time series data mining by the Query by content and K-medoids Clustering techniques on 23 datasets from the UCR datasets. The result shows that it is possible to claim that the performance of the DLCSS is better than the LCSS and DTW with 90% confidence.},
 year = {2019}
}

Copy | Download

TY  - JOUR
T1  - A New Similarity Measure for Time Series Data Mining Based on Longest Common Subsequence
AU  - Gholamreza Soleimany
AU  - Masoud Abessi
Y1  - 2019/06/20
PY  - 2019
N1  - https://doi.org/10.11648/j.ajdmkd.20190401.16
DO  - 10.11648/j.ajdmkd.20190401.16
T2  - American Journal of Data Mining and Knowledge Discovery
JF  - American Journal of Data Mining and Knowledge Discovery
JO  - American Journal of Data Mining and Knowledge Discovery
SP  - 32
EP  - 45
PB  - Science Publishing Group
SN  - 2578-7837
UR  - https://doi.org/10.11648/j.ajdmkd.20190401.16
AB  - In this research, a new similarity measurement method that named Developed Longest Common Subsequence (DLCSS) is suggested for time series data mining. The main idea of the DLCSS is using the logic of the Longest Common Subsequence (LCSS) method and the concept of similarity in time series data. In most studies related to time series data mining, referred to the LCSS and Dynamic Time Warping (DTW) methods as the best and most usable for similarity measurement methods, but the LCSS is intrinsically designed to measure the similarity of two sequences of character, which later was developed for time series by defining and determining the similarity threshold. The value of similarity threshold has huge impact on the quality of time series data mining. In the DLCSS by defining two similarity thresholds and determining the values of them, this defect is eliminated. The performance of the DLCSS will be compared with the LCSS and DTW in time series data mining by the Query by content and K-medoids Clustering techniques on 23 datasets from the UCR datasets. The result shows that it is possible to claim that the performance of the DLCSS is better than the LCSS and DTW with 90% confidence.
VL  - 4
IS  - 1
ER  -

Copy | Download