The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets
ICDM(2004)
摘要
In this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of books a9.com, the news of Google News and the blogs of Blogline; (2) builds the clusters on-the-fly (ephemeral clustering) in response to a user query without adopting any pre-defined organization in categories; (3) labels the clusters with sentences of variable length, drawn from the snippets and possibly missing some terms, provided they are not too many;
更多查看译文
关键词
web pages,information retrieval,linear time,hierarchical clustering,knowledge based systems,search engines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络