TWStream: Three-Way Stream Clustering

Jiarui Sun,Mingjing Du, Zhenkang Lew,Yongquan Dong

IEEE Transactions on Fuzzy Systems(2024)

引用 0|浏览3
暂无评分
摘要
A bunch of stream clustering algorithms have been proposed recently to mine data streams generated at high speeds from hardware platforms and software applications. Density-based methods are widely used because they can handle outliers and capture clusters of arbitrary shapes. However, it is still hard to effectively identify multi-density clusters with ambiguous boundaries in a data stream. To address these limitations, this paper introduces a data stream clustering algorithm called TWStream, based on the three-way decision theory. It is a two-stage clustering algorithm based on density. In the online stage, an augmented $k$ nn graph is maintained incrementally to accelerate the update of the $k$ nn graph. In the offline stage, TWStream introduces the concept of boundary confidence to detect cluster boundaries efficiently and reveal potential cores of clusters. It integrates the skewness and sparsity of the data distribution, as well as the evolving trend of the stream.In the next step, a micro-cluster-based three-way clustering strategy is applied to reconstruct latent clusters. It improves the clustering quality of boundary-ambiguous clusters in a stream using a mutual reachability-based clustering approach and a three-way assignment approach. The proposed algorithm is compared with 9 competitors on 15 data streams. Experimental results show TWStream achieves competitive performance, verifying its effectiveness. The source code of the proposed TWStream can be available at https://github.com/Du-Team/TWStream .
更多
查看译文
关键词
Three-way decision,data stream,three-way clustering,density-based clustering,uncertain data analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要