Effective Change Detection Using Sampling

VLDB(2002)

引用 84|浏览43
暂无评分
摘要
For a large-scale data-intensive environment, such as the World-Wide Web or data warehous- ing, we often make local copies of remote data sources. Due to limited network and compu- tational resources, however, it is often difficult to monitor the sources constantly to check for changes and to download changed data items to the copies. In this scenario, our goal is to detect as many changes as we can using the fixed download resources that we have. In this paper we propose three sampling-based down- load policies that can identify more changed data items effectively. In our sampling-based approach, we first sample a small number of data items from each data source and down- load more data items from the sources with more changed samples. We analyze the ef- fectiveness of the sampling-based policies and compare our proposed policies to existing ones, including the state-of-the-art frequency-based policy in (8, 11). Our experiments on synthetic and real-world data will show the relative mer- its of various policies and the great potential of our sampling-based policy. In certain cases, our sampling-based policy could download twice as many changed items as the best existing policy.
更多
查看译文
关键词
remote data source,real-world data,data item,data warehousing,sampling-based download policy,sampling-based approach,fixed download resource,changed data item,effective change detection,sampling-based policy,data source,world wide web,change detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要