Text Clustering Incremental Algorithm in Sensitive Topic Detection
International Journal of Information and Communication Sciences
Volume 3, Issue 3, September 2018, Pages: 88-95
Received: Aug. 28, 2018;
Accepted: Sep. 27, 2018;
Published: Oct. 30, 2018
Views 552 Downloads 35
Yuejin Zhang, Cyber Security Department, Municipal Cyberspace Administration, Beijing, China
Jiajia Zhang, Eliot K-8 Innovation School, Boston, America
Dongmei Zhao, Department of Electronic Commerce, China Agricultural University, Beijing, China
Follow on us
With the rapid development of Internet technology, the influence of online consensus continues to expand. How to quickly and effectively discover sensitive topics and keep track of those topics has become an important research recently. Text clustering can aggregate news texts with the same or similar content to achieve the purpose of discovering topics automatically. Make improvement to clustering algorithm according to different media types is the main research direction. Although the existing typical clustering algorithms have certain advantages, they all face constraints on data size and data characteristics in specific applications. There is no existing algorithm can fully adapt to these characteristics. Although the application of more Single-pass algorithms in the (TDT) field can realize the discovery and tracking of topics, there are disadvantages of poor accuracy and slow speed under massive data. According to the dynamic evolution characteristics of online consensus, this paper proposes an incremental text clustering algorithm based on Single-pass, which optimizes the clustering accuracy and efficiency of massive news. Based on the real online news texts from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meets the real-time demand of online topic detection and has a certain practical value.
Topic Detection, Online Consensus, Simhash Algorithm, Text Clustering, Incremental Algorithm, Single-Pass Algorithm
To cite this article
Text Clustering Incremental Algorithm in Sensitive Topic Detection, International Journal of Information and Communication Sciences.
Vol. 3, No. 3,
2018, pp. 88-95.
Copyright © 2018 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/
) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Chen Ning. Research in clustering algorithm of data excavation [D]. Mathematics and systematic science in CAS, 2001.
Chen C C, Chen Y T, Sun Y, et al. Life cycle modeling of news events using aging theory[M]//Machine Learning: ECML 2003. Springer Berlin Heidelberg, 2003: 47-59.
Liu Yuanchao, Wang Xiaolong, Xu Zhiming etc. Text clustering Summary [J]. Chinese Information Journal, 2006, 20(3): 55-62.
J Azzopardi, C Staff. Incremental Clustering of News Reports. Algorithms, 2012, 5 (3): 364-378.
Company, Suizhou 441300, China. Applied research of text clustering algorithm in network monitoring public opinion [J]. Electronic Design Engineering, 2013-01.
Yang Y, Carbonell J, Brown R, et al. Learning approaches for detecting and tracking news events [J]. Intelligent Systems & Their Applications IEEE, 1999, 14(4): 32-43.
Yin Fengjing, Xiao Weidong, Gebing etc. An incremental text clustering algorithm facing to internet topic detection [J]. Computer Application Research, 2011, 28(1): 54-57.
Lei Zhen, Wu Lingda, Lei Lei etc. The incremental parameter K in average value method of initial class center and its application in news exploration [J]. Intelligence Academic Journal, 2006, 25(3): 289-295.
X Yi, X Zhao, N Ke, F Zhao etc. An improved Single-Pass clustering algorithm internet-oriented network topic detection. International Conference on Intelligent Control & information processing, 2013: 560-564.
M Mittal, RK Sharma, VP Singh. Modified single pass clustering with variable threshold approach. «International Journal of Innovative Computing information & control Ijicic», 2015, 11 (1): 375-386.
Charikar M S. Similarity estimation techniques from rounding algorithms [C]//Proceedings of the thirty-fourth annual ACM symposium on Theory of computing. ACM, 2002: 380-388.