An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies
International Journal of Intelligent Information Systems
Volume 3, Issue 1, February 2014, Pages: 1-7
Received: Jan. 19, 2014;
Published: Feb. 20, 2014
Views 2540 Downloads 137
Masafumi Matsuhara, Department of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Toshihiro Yoshida, NTT Advanced Technology Corporation, Kanagawa, Japan
In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.
An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies, International Journal of Intelligent Information Systems.
Vol. 3, No. 1,
2014, pp. 1-7.
JerzyStefanowski and DawidWeiss, "Carrot2 and Language Properties in Web Search Results Clustering", Advances in Web Intelligence, 2003.
Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi, "A Novel Ranking Method of Web Search Result Using Clustering and Concordance Count", Proc. of WCCI 2012 IEEE World Congress on Computational Intelligence, pp.902--907, Brisbane, Australia, June 10-15, 2012.
Marti A. Hearst and Jan O. Pedersen, "Reexamining the Cluster Hypothesis: Scater/Gather on Retrieval Results", SIGIR'96, ACM, pp.76-84, 1996.
Patrick Pantel and Dekang Lin, "Document clustering with committees",SIGIR'02, ACM, pp.199-206, 2002.
OmarAlonso, MichaelGertz and RicardoBaeza-Yates, "Clustering and Exploring Search Results using Timeline Constructions",CIKM'09, pp.97-106, 2009.
Songhua Xu, Tao Jin and Francis C.M. Lau, "A New visual Search Interface for Web Browsing",Proc. 2nd ACM International Conference on Web Search and Data Mining, ACM, pp.152-161, 2009.
OrenZamir, OrenEtzioni, OmidMadani and RichardM. Karp, "Fast and Intuitive Clustering of Web Documents",Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997.
OrenZamir and OrenEtzioni, "Web Document Clustering A Feasibility Demonstration", SIGIR 1998, 46-54.
Oren Zamir and Oren Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results",WWW'99: Proc. 8th international World Wide Web Conference, pp.1361-1374, Elsevier North-Holland, Inc., 1999.
DavidCarmel, HaggaiRoitman and NaamaZwerdling, "Enhancing Cluster Labeling Using Wikipedia",Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp.139-146, 2009.
Paolo Ferragina and Antonio Gulli, "A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering",WWW'05: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp.801-810 , 2005.
Stanis law Osinski, Jerzy Stefanowski and Dawid Weiss,"Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition",Proc. International IIS: IIPWM'04 Conference, pp.359-368, 2004.
ThorstenJoachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization",DTIC Document, 1996.
Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi,"Labeling Method with Threshold in Web Search Results",Proc. of FIT2011, pp.365--366, Hakodate, Japan, September 7-9, 2011.(in Japanese)