Clustering-Driven and Dynamically Diversified Ensemble for Drifting Data Streams
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2018)
摘要
Data stream mining is a rapidly developing branch of contemporary machine learning. Ensemble approaches have proven themselves to be highly effective in this domain, due to their predictive power and capabilities for handling evolving data. One of the key aspects of ensemble learning is diversity among base classifiers - it improves accuracy and allows for anticipating and recovering from concept drifts. It has been shown that while diversity is desirable during changes, it may impede learning when data becomes stationary. In this paper, we present a novel ensemble technique that exploits the idea of dynamic diversification, which increases diversity during changes and reduces it when a stream becomes stable. The algorithm uses online clustering for this task by creating locally specialized base learners trained on spatially related instances. Three control strategies based on the novel range heuristic for managing a trade-off between error (a change indicator) and diversity are utilized. Additionally, two intensification strategies are proposed for exploitation of newly arriving instances, allowing for faster adaptation. Experimental study evaluates the general performance and diversity of the proposed algorithm, proving its capabilities to outperform state-of-the-art ensembles dedicated to drifting data stream mining.
更多查看译文
关键词
machine learning, data stream mining, classification, concept drift, ensemble learning, diversity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络