Source Retrieval Model Focused on Aggregation for plagiarism detection

Kong Lei-lei,Han Zhong-yuan, Qi Hao-liang,Yang Mu-yun

Information Sciences(2019)

引用 8|浏览55
暂无评分
摘要
Source retrieval for plagiarism detection retrieves documents that may be the sources of plagiarism for a suspicious document while minimizing the retrieval costs. Compared with traditional information retrieval, queries from the same suspicious document retain their contextual relations and are no longer isolated from each other. The correlation of queries leads to the aggregation of search results; the documents retrieved by different queries may be relevant or the same. However, previous studies have failed to devote sufficient attention to the aggregation of search results in source retrieval. In this paper, the task of source retrieval is formalized into a framework of learning to rank, and a Ranking Logistical Regression Model is utilized to implement the framework. Furthermore, addressing the aggregation of search results, we propose the Source Retrieval Model Focused on Aggregation for plagiarism detection. We evaluate various aspects of the proposed model on the PAN 2013 and the PAN 2014 Plagiarism Source Retrieval Corpus. With respect to established baselines, the experimental results indicate that the Source Retrieval Model Focused on Aggregation yields statistically significantly improved performance of source retrieval with equal or lower cost effectiveness.
更多
查看译文
关键词
Source retrieval,Plagiarism detection,Learning to rank,Aggregation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要