An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies

Masafumi Matsuhara; Toshihiro Yoshida

doi:doi:10.11648/j.ijiis.20140301.11

| Peer-Reviewed

An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies

Masafumi Matsuhara, Toshihiro Yoshida

Published in International Journal of Intelligent Information Systems (Volume 3, Issue 1)

Received: 19 January 2014 Published: 20 February 2014

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.

Published in	International Journal of Intelligent Information Systems (Volume 3, Issue 1)
DOI	10.11648/j.ijiis.20140301.11
Page(s)	1-7
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2014. Published by Science Publishing Group

Keywords

Labeling, Clustering; Web Search

References

[1]	JerzyStefanowski and DawidWeiss, "Carrot2 and Language Properties in Web Search Results Clustering", Advances in Web Intelligence, 2003.
[2]	Carrot, http://search.carrot2.org/stable/search
[3]	Yippy, http://search.yippy.com/
[4]	Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi, "A Novel Ranking Method of Web Search Result Using Clustering and Concordance Count", Proc. of WCCI 2012 IEEE World Congress on Computational Intelligence, pp.902--907, Brisbane, Australia, June 10-15, 2012.
[5]	Marti A. Hearst and Jan O. Pedersen, "Reexamining the Cluster Hypothesis:　Scater/Gather on Retrieval Results", SIGIR'96, ACM, pp.76-84, 1996.
[6]	Patrick Pantel and Dekang Lin, "Document clustering with committees",SIGIR'02, ACM, pp.199-206, 2002.
[7]	OmarAlonso, MichaelGertz and RicardoBaeza-Yates, "Clustering and Exploring Search Results using Timeline Constructions",CIKM'09, pp.97-106, 2009.
[8]	Songhua Xu, Tao Jin and Francis C.M. Lau, "A New visual Search Interface for Web Browsing",Proc. 2nd ACM International Conference on Web Search and Data Mining, ACM, pp.152-161, 2009.
[9]	OrenZamir, OrenEtzioni, OmidMadani and RichardM. Karp, "Fast and Intuitive Clustering of Web Documents",Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997.
[10]	OrenZamir and OrenEtzioni, "Web Document Clustering A Feasibility Demonstration", SIGIR 1998, 46-54.
[11]	Oren Zamir and Oren Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results",WWW'99: Proc. 8th international World Wide Web Conference, pp.1361-1374, Elsevier North-Holland, Inc., 1999.
[12]	DavidCarmel, HaggaiRoitman and NaamaZwerdling, "Enhancing Cluster Labeling Using Wikipedia",Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp.139-146, 2009.
[13]	Paolo Ferragina and Antonio Gulli, "A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering",WWW'05: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp.801-810 , 2005.
[14]	Stanis law Osinski, Jerzy Stefanowski and Dawid Weiss,"Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition",Proc. International IIS: IIPWM'04 Conference, pp.359-368, 2004.
[15]	ThorstenJoachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization",DTIC Document, 1996.
[16]	Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi,"Labeling Method with Threshold in Web Search Results",Proc. of FIT2011, pp.365--366, Hakodate, Japan, September 7-9, 2011.(in Japanese)

Cite This Article

Plain Text BibTeX RIS

APA Style

Masafumi Matsuhara, Toshihiro Yoshida. (2014). An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. International Journal of Intelligent Information Systems, 3(1), 1-7. https://doi.org/10.11648/j.ijiis.20140301.11

Copy | Download

ACS Style

Masafumi Matsuhara; Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int. J. Intell. Inf. Syst. 2014, 3(1), 1-7. doi: 10.11648/j.ijiis.20140301.11

Copy | Download

AMA Style

Masafumi Matsuhara, Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int J Intell Inf Syst. 2014;3(1):1-7. doi: 10.11648/j.ijiis.20140301.11

Copy | Download

@article{10.11648/j.ijiis.20140301.11,
  author = {Masafumi Matsuhara and Toshihiro Yoshida},
  title = {An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies},
  journal = {International Journal of Intelligent Information Systems},
  volume = {3},
  number = {1},
  pages = {1-7},
  doi = {10.11648/j.ijiis.20140301.11},
  url = {https://doi.org/10.11648/j.ijiis.20140301.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20140301.11},
  abstract = {In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.},
 year = {2014}
}

Copy | Download

TY - JOUR
T1 - An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies
AU - Masafumi Matsuhara
AU - Toshihiro Yoshida
Y1 - 2014/02/20
PY - 2014
N1 - https://doi.org/10.11648/j.ijiis.20140301.11
DO - 10.11648/j.ijiis.20140301.11
T2 - International Journal of Intelligent Information Systems
JF - International Journal of Intelligent Information Systems
JO - International Journal of Intelligent Information Systems
SP - 1
EP - 7
PB - Science Publishing Group
SN - 2328-7683
UR - https://doi.org/10.11648/j.ijiis.20140301.11
AB - In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.
VL - 3
IS - 1
ER -

Copy | Download

Author Information

Masafumi Matsuhara

Department of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Toshihiro Yoshida

NTT Advanced Technology Corporation, Kanagawa, Japan

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Masafumi Matsuhara, Toshihiro Yoshida. (2014). An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. International Journal of Intelligent Information Systems, 3(1), 1-7. https://doi.org/10.11648/j.ijiis.20140301.11

Copy | Download

ACS Style

Masafumi Matsuhara; Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int. J. Intell. Inf. Syst. 2014, 3(1), 1-7. doi: 10.11648/j.ijiis.20140301.11

Copy | Download

AMA Style

Masafumi Matsuhara, Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int J Intell Inf Syst. 2014;3(1):1-7. doi: 10.11648/j.ijiis.20140301.11

Copy | Download

@article{10.11648/j.ijiis.20140301.11,
  author = {Masafumi Matsuhara and Toshihiro Yoshida},
  title = {An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies},
  journal = {International Journal of Intelligent Information Systems},
  volume = {3},
  number = {1},
  pages = {1-7},
  doi = {10.11648/j.ijiis.20140301.11},
  url = {https://doi.org/10.11648/j.ijiis.20140301.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20140301.11},
  abstract = {In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.},
 year = {2014}
}

Copy | Download