Creating robust supervised classifiers via web-scale N-gram data
ACL(2010)
摘要
In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
更多查看译文
关键词
new domain,training data,web-scale n-gram data,standard data set,web-scale auxiliary corpus,web-scale n-gram feature,n-gram count feature,robust supervised classifier,state-of-the-art supervised nlp classifier,state-of-the-art accuracy,noun compound bracketing,noun,part of speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络