When will Feature Feedback help ? Quantifying the Complexity of Classification Problems

msra(2007)

引用 23|浏览24
暂无评分
摘要
Supervised learning typically requires human effort to label a large number of training instances. Active learning strives to decrease the nu mber of labeled training examples needed by actively engaging the learner and the human in an interactive process. Active learning has proven to be effecti ve in many domains. With few training examples, past work has found that user prio r knowledge on the importance of features, or interactive feature feedback, c n guide the learner to converge faster, that is, with lower labeling costs. In this paper we aim to understand the kinds of problems for which such extra feedback are significantly beneficial. In other words, we ask what kind of problems can sign ifica tly benefit from interactive learning and whether for some problems the us er has no choice but to engage in the tedious process of labeling many examples. To wards this goal, we define a set of four difficulty measures, 2 each of instance and feature complexity, for linear classification problems. These measures can effi ciently be computed for real world problems for which linear classifiers are effe ctiv , such as text classification. We quantify the difficulty of 358 text classification problems and 9 corpora using our measures, illustrating the spectrum of problems that exist in text class ific tion in addition to quantifying results that have only been qualitatively discusse d in the text classification literature. We verify the intimate relationship (a high pos itive correlation) between feature complexity and instance complexity using our measures. We then use these measures to understand when feature fe edback is likely to be very useful. We observe that many problems in the commonly us ed data sets are of low to medium complexity, that is, only roughly 10s of well se lected features are required to gain most of the maximum attained perform ance on such concepts. We find that learning these kinds of problems especia lly stands to benefit from feature feedback. We note that our empirical difficulty measures and the rankings of proble ms and domains are of independent interest, beyond the active learning setting.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要