An Adaptive Framework for Multistream Classification

ACM International Conference on Information and Knowledge Management(2016)

引用 42|浏览62
暂无评分
摘要
A typical data stream classification involves predicting labels of data instances generated from a non-stationary process. Studies in the past decade have focused on addressing various challenges related to this problem setting such as concept drift and concept evolution. Most techniques assume availability of class labels associated with unlabeled data instances, soon after label prediction, for further training and drift detection. Moreover, training and test data distributions are assumed to be similar. These assumptions are not always true in practice. For instance, a semi-supervised setting aiming to utilize only a faction of labels may induce bias during data selection. Consequently, the data distribution of training and test instances may be different. In this paper, we present a novel stream classification problem setting involving two independent non-stationary data generating processes, relaxing the above two assumptions. A source stream continuously generates labeled data instances whose distribution is biased compared to that of a target stream which generates unlabeled data instances from the same domain. The problem, we call Multistream Classification, is to predict the class labels of data instances in the target stream, while utilizing labels available on the source stream. Since concept drift can occur asynchronously on the two streams, we design an adaptive framework by proposing a technique for supervised concept drift detection in the biased source stream, and unsupervised concept drift detection in the target stream. A weighted ensemble of classifiers is updated after each drift detection on either streams, while utilizing a bias correction mechanism that leverage source information to predict labels of target instances whenever necessary. We empirically evaluate the multistream classifier's performance on both real-world and synthetic datasets, while comparing with various baseline methods and its variants.
更多
查看译文
关键词
Data Stream,Classification,Covariate Shift,Concept Drift
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要