Active Learning for Streaming Networked Data

CIKM(2014)

引用 15|浏览185
暂无评分
摘要
Mining high-speed data streams has become an important topic due to the rapid growth of online data. In this paper, we study the problem of active learning for streaming networked data. The goal is to train an accurate model for classifying networked data that arrives in a streaming manner by querying as few labels as possible. The problem is extremely challenging, as both the data distribution and the network structure may change over time. The query decision has to be made for each data instance sequentially, by considering the dynamic network structure. We propose a novel streaming active query strategy based on structural variability. We prove that by querying labels we can monotonically decrease the structural variability and better adapt to concept drift. To speed up the learning process, we present a network sampling algorithm to sample instances from the data stream, which provides a way for us to handle large volume of streaming data. We evaluate the proposed approach on four datasets of different genres: Weibo, Slashdot, IMDB, and ArnetMiner. Experimental results show that our model performs much better (+5-10% by F1-score on average) than several alternative methods for active learning over streaming networked data.
更多
查看译文
关键词
network sampling,data streams,active learning,data mining,learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要