Temporal silhouette: validation of stream clustering robust to concept drift

Machine Learning(2023)

引用 0|浏览0
暂无评分
摘要
Stream clustering is required in applications where data is generated continuously or periodically and must be processed considering its temporal nature. In the absence of a ground truth, internal validation is the only option to evaluate the quality of performances. Traditional internal validation is commonly used also in stream clustering, even in spite of the fact that it becomes inconsistent in the event of data evolution. Recent trends opt for incremental approaches, but these are closer to change detection rather than validation methods and limit themselves by imposing online validation on online analysis. In this work we study the impact of concept drift in the validation of stream clustering and propose the Temporal Silhouette index, therefore making internal validation conform to streaming data. We conduct tests with more than 200 datasets and contrast performances of four popular stream clustering algorithms with seven validation methods (three static internal, three incremental internal, one external) and the proposed index. Results show the suitability of the Temporal Silhouette index for stream clustering validation in the event of concept drift and different types of outliers. The demand for reliable unsupervised learning in applications that process data in streams is ever-increasing, and such reliability inevitably requires the use of validation. This fact highlights the significance of the novel approach proposed in this work.
更多
查看译文
关键词
Stream clustering,Clustering validation,Multivariate time series
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要