Probabilistic Models To Reconcile Complex Data From Inaccurate Data Sources

CAiSE'10: Proceedings of the 22nd international conference on Advanced information systems engineering(2010)

引用 42|浏览18
暂无评分
摘要
Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.
更多
查看译文
关键词
real-life data,web data,probabilistic model,proposed approach,web source,available evidence,inaccurate source,misleading consensus,probability distribution,complex data,inaccurate data source
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要