Real-Time Entity Resolution by Forest-Based Indexing in Database Systems with Vertical Fragmentations.

CSAE(2021)

引用 0|浏览2
暂无评分
摘要
Entity resolution (ER) is the process of identifying and matching which tuples/records in a dataset/relation refer to the same real-world entity. Real-time ER is a challenge for large datasets. Schema decomposition is of importance in (distributed) database systems, which partitions a relation/table into a set of vertical fragmentations. For this scenario, we study real-time ER in this paper. By creating forest-based indexing and defining ranking functions and corresponding algorithms, we propose an approach to resolve query tuples over dirty relations of a set of vertical fragmentations with duplicates, misspellings, or NULL values of text attributes. Extensive experiments are conducted to demonstrate the performances of our proposed approach.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要