With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.
Published in | International Journal of Information and Communication Sciences (Volume 3, Issue 3) |
DOI | 10.11648/j.ijics.20180303.12 |
Page(s) | 88-95 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2018. Published by Science Publishing Group |
Topic Detection, Online Consensus, Simhash Algorithm, Text Clustering, Incremental Algorithm, Single-Pass Algorithm
[1] | Chen Ning. Research in clustering algorithm of data excavation [D]. Mathematics and systematic science in CAS, 2001. |
[2] | Chen C C, Chen Y T, Sun Y, et al. Life cycle modeling of news events using aging theory[M]//Machine Learning: ECML 2003. Springer Berlin Heidelberg, 2003: 47-59. |
[3] | Liu Yuanchao, Wang Xiaolong, Xu Zhiming etc. Text clustering Summary [J]. Chinese Information Journal, 2006, 20(3): 55-62. |
[4] | J Azzopardi, C Staff. Incremental Clustering of News Reports. Algorithms, 2012, 5 (3): 364-378. |
[5] | Company, Suizhou 441300, China. Applied research of text clustering algorithm in network monitoring public opinion [J]. Electronic Design Engineering, 2013-01. |
[6] | Yang Y, Carbonell J, Brown R, et al. Learning approaches for detecting and tracking news events [J]. Intelligent Systems & Their Applications IEEE, 1999, 14(4): 32-43. |
[7] | Yin Fengjing, Xiao Weidong, Gebing etc. An incremental text clustering algorithm facing to internet topic detection [J]. Computer Application Research, 2011, 28(1): 54-57. |
[8] | Lei Zhen, Wu Lingda, Lei Lei etc. The incremental parameter K in average value method of initial class center and its application in news exploration [J]. Intelligence Academic Journal, 2006, 25(3): 289-295. |
[9] | X Yi, X Zhao, N Ke, F Zhao etc. An improved Single-Pass clustering algorithm internet-oriented network topic detection. International Conference on Intelligent Control & information processing, 2013: 560-564. |
[10] | M Mittal, RK Sharma, VP Singh. Modified single pass clustering with variable threshold approach. «International Journal of Innovative Computing information & control Ijicic», 2015, 11 (1): 375-386. |
[11] | Charikar M S. Similarity estimation techniques from rounding algorithms [C]//Proceedings of the thirty-fourth annual ACM symposium on Theory of computing. ACM, 2002: 380-388. |
APA Style
Yuejin Zhang, Jiajia Zhang, Dongmei Zhao. (2018). Text Clustering Incremental Algorithm in Sensitive Topic Detection. International Journal of Information and Communication Sciences, 3(3), 88-95. https://doi.org/10.11648/j.ijics.20180303.12
ACS Style
Yuejin Zhang; Jiajia Zhang; Dongmei Zhao. Text Clustering Incremental Algorithm in Sensitive Topic Detection. Int. J. Inf. Commun. Sci. 2018, 3(3), 88-95. doi: 10.11648/j.ijics.20180303.12
AMA Style
Yuejin Zhang, Jiajia Zhang, Dongmei Zhao. Text Clustering Incremental Algorithm in Sensitive Topic Detection. Int J Inf Commun Sci. 2018;3(3):88-95. doi: 10.11648/j.ijics.20180303.12
@article{10.11648/j.ijics.20180303.12, author = {Yuejin Zhang and Jiajia Zhang and Dongmei Zhao}, title = {Text Clustering Incremental Algorithm in Sensitive Topic Detection}, journal = {International Journal of Information and Communication Sciences}, volume = {3}, number = {3}, pages = {88-95}, doi = {10.11648/j.ijics.20180303.12}, url = {https://doi.org/10.11648/j.ijics.20180303.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijics.20180303.12}, abstract = {With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.}, year = {2018} }
TY - JOUR T1 - Text Clustering Incremental Algorithm in Sensitive Topic Detection AU - Yuejin Zhang AU - Jiajia Zhang AU - Dongmei Zhao Y1 - 2018/10/30 PY - 2018 N1 - https://doi.org/10.11648/j.ijics.20180303.12 DO - 10.11648/j.ijics.20180303.12 T2 - International Journal of Information and Communication Sciences JF - International Journal of Information and Communication Sciences JO - International Journal of Information and Communication Sciences SP - 88 EP - 95 PB - Science Publishing Group SN - 2575-1719 UR - https://doi.org/10.11648/j.ijics.20180303.12 AB - With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value. VL - 3 IS - 3 ER -