Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 15|浏览83
暂无评分
摘要
Given a set of records, entity resolution algorithms find all the records referring to each entity. In top-k entity resolution, the goal is to find all the records referring to the k largest (in terms of number of records) entities. Top-k entity resolution is driven by many modern applications that operate over just the few most popular entities in a dataset. In this paper we introduce the problem of top-k entity resolution and we summarize a novel approach for this problem; full details are presented in a technical report. Our approach is based on locality-sensitive hashing, and can very rapidly and accurately process massive datasets. Our key insight is to adaptively decide how much processing each record requires to ascertain if it refers to a top-k entity or not: the less likely a record is to refer to a top-k entity, the less it is processed. The heavily reduced amount of processing for the vast majority of records that do not refer to top-k entities, leads to significant speedups. Our experiments with images, web articles, and scientific publications show a 2x to 25x speedup compared to traditional approaches for high-dimensional data.
更多
查看译文
关键词
Erbium,Computer bugs,Image resolution,Cameras,Videos,Partitioning algorithms,Hash functions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要