Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking

ALGORITHMS AND MODELS FOR THE WEB-GRAPH(2008)

引用 1|浏览0
暂无评分
摘要
Users typically locate useful Web pages by querying a search engine. However, today's search engines are seriously threatened by malicious spam pages that attempt to subvert the unbiased searching and ranking services provided by the engines. Given the large fraction of Web traffic originating from search engine referrals and the high potential monetary value of this traffic, it is not surprising that some Web site owners try to influence the ranking function of a search engine in a malicious way, thus giving rise to Web spam. Since the algorithmic identification of spam is very difficult, most techniques require either some human assistance or extensive training to effectively deal with spam. We exploit the possibility of automatically reducing Web spam page in a Web collection by analyzing the Web graph, coupled with very simple content analysis. We present empirical evaluation of our approach on 1 million Web pages from the health domain. Our results clearly indicate that we can effectively filter out a significant fraction of Web spam pages.
更多
查看译文
关键词
web collection,malicious spam page,web traffic,web graph,web site owner,useful web page,combating spamdexing,web spam,link-based ranking,search engine,incorporating heuristics,million web page,web spam page,content analysis,service provider,web pages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要