Using deep belief network to demote web spam

Future Generation Computer Systems(2021)

引用 7|浏览5
暂无评分
摘要
Many score propagation based Web Spam Demotion Algorithms (WSDAs) have been proposed in last decade. There are two major challenges those algorithms suffer from. First, the non-incremental property of score propagation based WSDAs restricted their applications in real world since Web changes rapidly and running algorithm on the entire Web graph is computation consuming. Second, the score propagation based WSDAs adopt only link structure of the web graph to demote Web spam, so that they are vulnerable to some other kind of spamming techniques, such as content spam. In this paper, we propose a preference-based learning to rank method to address the above-mentioned issues confronted by score propagation based WSDAs. Our proposal consists of two components, a preference function and an ordering algorithm. The preference function is modeled by Deep Belief Network (DBN), which can benefit from unlabeled data for better generalization. The proposed Incremental Probabilistic Ordering Algorithm (IPOA) uses the trained preference function to calculate top-ranking probabilities of Web pages, and then uses those probabilities for final ranking. Therefore, the complex object (i.e. Web page) ranking problem is reduced to real number ranking problem, which can be solved efficiently by classical sorting algorithm. We conduct experiments to compare our proposal with conventional score propagation based WSDAs as well as some popular preference based learning to rank algorithms on two public available datasets, WEBSPAM-UK2006 and WEBSPAM-UK2007. Our experimental results demonstrate the superiority of our proposed method. Specifically, compared with score propagation based WSDAs, we obtain 0.0074 absolute improvement (0.7% relative improvement) on WEBSPAM-UK2006 and 0.065 absolute improvement (7.3% relative improvement) on WEBSPAM-UK2007 in terms of spam demotion score.
更多
查看译文
关键词
Web spam demotion,Deep belief network,Learning to rank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要