TA-DRD: A Three-step Automatic Duplicate Record Detection

Yongquan Dong, Ping Ling,Yali Liu, Qiang Chu

The Open Automation and Control Systems Journal(2014)

引用 0|浏览0
暂无评分
摘要
Duplicate record detection is a key step in Deep Web data integration, but the existing approaches do not adapt to its large-scale nature.In this paper, a three-step automatic approach is proposed for duplicate record detection in Deep Web.It firstly uses cluster ensemble to select initial training instance.Then it utilizes tri-training classification to construct classification model.Finally, it uses evidence theory to combine the results of multiple classification models to construct the domain-level duplicate record detection model which can be used for large-scale duplicate record detection in the same domain.Experimental results show that the proposed approach is better than previous work and and the domainlevel duplicate record detection model can get high performance.
更多
查看译文
关键词
deep web,image reconstruction,data integration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要