Vuvuzelas & Active Learning for Online Classification
computational social science(2010)
摘要
Many online service systems leverage user-generated content from Web 2.0 style platforms such as Wikipedia, Twitter, Facebook, and many more. Often, the value lies in the freshness of this information (e.g. tweets, event-based articles, blog posts, etc.). This freshness poses a challenge for supervised learning models as they frequently have to deal with previously unseen features. In this paper we address the problem of online classification for tweets, namely, how can a classifier be updated in an online manner, so that it can correctly classify the latest “hype” on Twitter? We propose a two-step strategy to solve this problem. The first step follows an active learning strategy that enables the selection of tweets for which a label would be most useful; the selected tweet is then forwarded to Amazon Mechanical Turk where it is labeled by multiple users. The second step builds on a Bayesian corroboration model that aggregates the noisy labels provided by the users by taking their reliabilities into account.
更多查看译文
关键词
service system,active learning,user generated content,supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络