| Peer-Reviewed

An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies

Received: 19 January 2014    Accepted:     Published: 20 February 2014
Views:       Downloads:
Abstract

In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.

Published in International Journal of Intelligent Information Systems (Volume 3, Issue 1)
DOI 10.11648/j.ijiis.20140301.11
Page(s) 1-7
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Labeling, Clustering; Web Search

References
[1] JerzyStefanowski and DawidWeiss, "Carrot2 and Language Properties in Web Search Results Clustering", Advances in Web Intelligence, 2003.
[2] Carrot, http://search.carrot2.org/stable/search
[3] Yippy, http://search.yippy.com/
[4] Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi, "A Novel Ranking Method of Web Search Result Using Clustering and Concordance Count", Proc. of WCCI 2012 IEEE World Congress on Computational Intelligence, pp.902--907, Brisbane, Australia, June 10-15, 2012.
[5] Marti A. Hearst and Jan O. Pedersen, "Reexamining the Cluster Hypothesis: Scater/Gather on Retrieval Results", SIGIR'96, ACM, pp.76-84, 1996.
[6] Patrick Pantel and Dekang Lin, "Document clustering with committees",SIGIR'02, ACM, pp.199-206, 2002.
[7] OmarAlonso, MichaelGertz and RicardoBaeza-Yates, "Clustering and Exploring Search Results using Timeline Constructions",CIKM'09, pp.97-106, 2009.
[8] Songhua Xu, Tao Jin and Francis C.M. Lau, "A New visual Search Interface for Web Browsing",Proc. 2nd ACM International Conference on Web Search and Data Mining, ACM, pp.152-161, 2009.
[9] OrenZamir, OrenEtzioni, OmidMadani and RichardM. Karp, "Fast and Intuitive Clustering of Web Documents",Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997.
[10] OrenZamir and OrenEtzioni, "Web Document Clustering A Feasibility Demonstration", SIGIR 1998, 46-54.
[11] Oren Zamir and Oren Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results",WWW'99: Proc. 8th international World Wide Web Conference, pp.1361-1374, Elsevier North-Holland, Inc., 1999.
[12] DavidCarmel, HaggaiRoitman and NaamaZwerdling, "Enhancing Cluster Labeling Using Wikipedia",Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp.139-146, 2009.
[13] Paolo Ferragina and Antonio Gulli, "A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering",WWW'05: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, pp.801-810 , 2005.
[14] Stanis law Osinski, Jerzy Stefanowski and Dawid Weiss,"Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition",Proc. International IIS: IIPWM'04 Conference, pp.359-368, 2004.
[15] ThorstenJoachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization",DTIC Document, 1996.
[16] Toshihiro Yoshida, MasafumiMatsuhara, GoutamChakraborty and Hiroshi Mabuchi,"Labeling Method with Threshold in Web Search Results",Proc. of FIT2011, pp.365--366, Hakodate, Japan, September 7-9, 2011.(in Japanese)
Cite This Article
  • APA Style

    Masafumi Matsuhara, Toshihiro Yoshida. (2014). An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. International Journal of Intelligent Information Systems, 3(1), 1-7. https://doi.org/10.11648/j.ijiis.20140301.11

    Copy | Download

    ACS Style

    Masafumi Matsuhara; Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int. J. Intell. Inf. Syst. 2014, 3(1), 1-7. doi: 10.11648/j.ijiis.20140301.11

    Copy | Download

    AMA Style

    Masafumi Matsuhara, Toshihiro Yoshida. An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies. Int J Intell Inf Syst. 2014;3(1):1-7. doi: 10.11648/j.ijiis.20140301.11

    Copy | Download

  • @article{10.11648/j.ijiis.20140301.11,
      author = {Masafumi Matsuhara and Toshihiro Yoshida},
      title = {An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies},
      journal = {International Journal of Intelligent Information Systems},
      volume = {3},
      number = {1},
      pages = {1-7},
      doi = {10.11648/j.ijiis.20140301.11},
      url = {https://doi.org/10.11648/j.ijiis.20140301.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20140301.11},
      abstract = {In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.},
     year = {2014}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - An Effective Cluster-Aware Labeling Method for Web Search Results Using Concordant Document Frequencies
    AU  - Masafumi Matsuhara
    AU  - Toshihiro Yoshida
    Y1  - 2014/02/20
    PY  - 2014
    N1  - https://doi.org/10.11648/j.ijiis.20140301.11
    DO  - 10.11648/j.ijiis.20140301.11
    T2  - International Journal of Intelligent Information Systems
    JF  - International Journal of Intelligent Information Systems
    JO  - International Journal of Intelligent Information Systems
    SP  - 1
    EP  - 7
    PB  - Science Publishing Group
    SN  - 2328-7683
    UR  - https://doi.org/10.11648/j.ijiis.20140301.11
    AB  - In recent years, the amount of information on World Wide Web has exploded. Search engines are generally used for web searching; however, robot-type search engines have a few problems. One such problem is that it is difficult for a user to come up with an appropriate query for obtaining the search results she/he intends. Moreover, it is difficult for users to understand the contents of search results because a robot-type search engine outputs many search results in a long list format. To solve these problems, many methods have been proposed that classify the results of a robot-type search engine into clusters that are labeled and then shown to the user. To be effective, the cluster label needs to consist of appropriate words to describe the web sites within the cluster. In this study, we propose a labeling method using concordant document frequencies where the web search results of a query are classified into clusters and we use our techniques to assign the proper labels to those clusters. We then find the set of web sites that result from an AND-query using an original query word and the cluster label. If this set and the members of the cluster are common, we say that the concordant document frequency is high, and the cluster label is assigned a high weight. Thus, it is possible to assign an appropriate label using our proposed cluster-aware method. We demonstrate the effectiveness of our proposed method by simulation experiments.
    VL  - 3
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Software and Information Science, Iwate Prefectural University, Iwate, Japan

  • NTT Advanced Technology Corporation, Kanagawa, Japan

  • Sections