A Framework to Evaluate the Quality of Integrated Datasets

APPLIED COMPUTING REVIEW(2022)

引用 0|浏览5
暂无评分
摘要
Evaluation is a bottleneck in data integration processes: it is performed by domain experts through manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all integrated tuples infeasible. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process. The paper motivates and introduces the measure and provides extensive experimental evaluations, that show the effectiveness and the efficiency of the approach.
更多
查看译文
关键词
Entity Resolution,Entity Matching,Unsupervised Evaluation,Data Integration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要