Web spam challenge proposal for filtering in archives

International World Wide Web Conferences(2009)

引用 12|浏览0
暂无评分
摘要
In this paper we propose new tasks for a possible future Web Spam Challenge motivated by the needs of the archival community. The Web archival community consists of several relatively small institutions that operate independently and possibly over different top level domains (TLDs). Each of them may have a large set of historic crawls. Efficient filtering would hence require (1) enhanced use of the time series of domain snapshots and (2) collaboration by transferring models across different TLDs. Corresponding Challenge tasks could hence include the distribution of crawl snapshot data for feature generation as well as classification of unlabeled new crawls of the same or even different TLDs.
更多
查看译文
关键词
different tlds,web archival community,domain snapshot,new task,challenge task,archival community,crawl snapshot data,possible future web spam,enhanced use,web spam challenge proposal,different top level domain,time series,information retrieval,evaluation,web spam
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要