An Unsupervised Algorithm for Learning Blocking Schemes.

ICDM(2013)

引用 72|浏览11
暂无评分
摘要
A pairwise comparison of data objects is a requisite step in many data mining applications, but has quadratic complexity. In applications such as record linkage, blocking methods may be applied to reduce the cost. That is, the data is first partitioned into a set of blocks, and pairwise comparisons computed for pairs within each block. To date, blocking methods have required the blocking scheme be given, or the provision of training data enabling supervised learning algorithms to determine a blocking scheme. In either case, a domain expert is required. This paper develops an unsupervised method for learning a blocking scheme for tabular data sets. The method is divided into two phases. First, a weakly labeled training set is generated automatically in time linear in the number of records of the entire dataset. The second phase casts blocking key discovery as a Fisher feature selection problem. The approach is compared to a state-of-the-art supervised blocking key discovery algorithm on three real-world databases and achieves favorable results.
更多
查看译文
关键词
data mining,expert systems,unsupervised learning,Fisher feature selection problem,cost reduction,data mining,data objects,domain expert,learning blocking schemes,quadratic complexity,record linkage,supervised blocking key discovery algorithm,supervised learning algorithms,tabular data sets,unsupervised learning algorithm,weakly labeled training set,Blocking,Record Linkage
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要