Vandalism Detection in Wikidata
ACM International Conference on Information and Knowledge Management(2016)
摘要
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation which can be edited by anyone. Its knowledge is increasingly used within Wikipedia as well as in all kinds of information systems, which imposes high demands on its integrity. Nevertheless, Wikidata frequently gets vandalized, exposing all its users to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We engineer 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and achieves an area under curve of the receiver operating characteristic (ROC-AUC) of 0.991, thereby significantly outperforming the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROC-AUC) and a prototypical vandalism detector recently introduced by Wikimedia (0.868 ROC-AUC).
更多查看译文
关键词
Data Quality,Knowledge Base,Vandalism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络