Adaptive Clusters And Histograms Over Data Streams

IKE '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING(2005)

引用 24|浏览6
暂无评分
摘要
Incremental clustering and histograms over data streams have wide applications. Data streams that are non-stationary demand that they be adaptive in addition to being incremental. By adaptive, we mean that they reflect properties of data from the recent past. We discuss approaches to adaptive stream computations and advocate the use of forgetting-factors where data is associated with a weight that decays with time. We present the weighted k-means clustering algorithm using forgetting factors that does adaptive clustering over data streams. The main advantage of this algorithm is its simplicity and user friendliness. It allows users to dynamically change the number of clusters as well as the decay rates of different clusters depending on their interestingness. Further we show that adaptive multidimensional histograms can be maintained over real-valued data streams using adaptive clusters by treating each cluster as a bucket of the histogram. We observe that the clusters (as well as histograms) adapt well to the changes in the data. Using weighted-count range queries, we demonstrate the effectiveness of our adaptive histograms over non-stationary streams.
更多
查看译文
关键词
non-stationary streams, adaptive clusters, forgetting factors, histograms, query processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要