Generic Entity Resolution with Data Confidences

CleanDB(2006)

引用 67|浏览68
暂无评分
摘要
We consider the Entity Resolution (ER) problem (also known as deduplication, or merge-purge), in which records deter- mined to represent the same real-world entity are succes- sively located and merged. Our approach to the ER problem is generic, in the sense that the functions for comparing and merging records are viewed as black-boxes. In this context, managing numerical confidences along with the data makes the ER problem more challenging to define (e.g., how should confidences of merged records be combined?), and more ex- pensive to compute. In this paper, we propose a sound and flexible model for the ER problem with confidences, and propose ecient algorithms to solve it. We validate our algorithms through experiments that show significant per- formance improvements over naive schemes.
更多
查看译文
关键词
entity resolution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要