Online Semi-Supervised Classification on Multilabel Evolving High-Dimensional Text Streams

IEEE Transactions on Systems, Man, and Cybernetics: Systems(2023)

引用 0|浏览0
暂无评分
摘要
The multilabel learning task aims to predict the associated multiple classes of a given example simultaneously. Such task becomes more challenging when data arrives in stream since it requires concept drift adaptative, robust, and fast algorithm. In this article, we present an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. By leveraging a few labeled instances, OSMTS dynamically maintains the subspace of terms for each label with a set of evolving micro-clusters. For multilabel classification, $k$ nearest micro-clusters are employed for prediction by using a nonparametric Dirichlet model. To handle the gradual concept drift in term space, the triangular time function is adopted to calculate the difference between term arriving time and cluster life span. Whereas, abrupt concept drift is dealt by considering two procedures: 1) deleting outdated micro-cluster by exploiting the exponential decay function and 2) creating new micro-clusters by adopting the Chinese restaurant process based on the Dirichlet process. The conducted experimental study provides a comparison with 12 state-of-the-art algorithms on nine datasets in terms of classification performance, runtime, and memory consumption.
更多
查看译文
关键词
Graphical model, micro-clusters, semi-supervised learning, text stream, topic evolution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要