Candidate document retrieval for Arabic-based text reuse detection on the web

2016 12th International Conference on Innovations in Information Technology (IIT)(2016)

引用 2|浏览12
暂无评分
摘要
Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.
更多
查看译文
关键词
Web Document Retrieval,Query Generation,Text Reuse Detection,Fingerprinting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要