Efficient extraction of domain specific sentiment lexicon with active learning

Pattern Recognition Letters(2015)

引用 73|浏览34
暂无评分
摘要
Proposed a graphical model to extract a sentiment lexicon with document annotations.Applied an active learning to extract a sentiment lexicon to reduce the annotation.Suggested and experimented four distinct initialization methods for active learners.Proposed lexicon coverage analysis algorithm to initialize the active learner. Recent research indicates that a sentiment lexicon focusing on a specific domain leads to better sentiment analyses compared to a general-purpose sentiment lexicon, such as SentiWordNet. In spite of this potential improvement, the cost of building a domain-specific sentiment lexicon hinders its wider and more practical applications. To compensate for this difficulty, we propose extracting a sentiment lexicon from a domain-specific corpus by annotating an intelligently selected subset of documents in the corpus. Specifically, the subset is selected by an active learner with initializations from diverse text analytics, i.e. latent Dirichlet allocation and our proposed lexicon coverage algorithm. This active learning produces a better domain-specific sentiment lexicon which results in a higher accuracy of the sentiment classification. Subsequently, we evaluate extracted sentiment lexicons by observing (1) the increased F1 measure in sentiment classifications and (2) the increased similarity to the sentiment lexicon with the full annotation. We expect that this contribution will enable more accurate sentiment classification by domain-specific sentiment lexicons with less sentiment tagging efforts.
更多
查看译文
关键词
active learning,sentiment analysis,sentiment lexicon
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要