Sentiment Classification From Multi-Class Imbalanced Twitter Data Using Binarization

HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017(2017)

引用 24|浏览16
暂无评分
摘要
Twitter became one of the most dynamically developing areas of social media. Due to concise nature of messages, rapid publication and high outreach, people share more and more of their opinions, thoughts and commentaries using this medium. Sentiment analysis is a specific subsection of natural language processing that concentrates on automatically categorizing opinions and attitudes expressed in a given portion of textual information. This requires dedicated machine learning solutions that are able to handle various difficulties embedded in the nature of data. In this paper, we present an efficient framework for automatic sentiment analysis from high-dimensional and sparse datasets that suffer from multi-class imbalance. We propose to approach it by applying a one-vs-one binary decomposition and reducing the dimensionality of each pairwise class set using Multiple Correspondence Analysis. Then we apply preprocessing to alleviate the skewed distributions in reduced number of dimensions. After that, on each pair of classes we train a binary classifier and combined them using a weighted multi-class reconstruction that promotes minority classes. The proposal is evaluated on a large Twitter dataset and obtained results are in favor of the proposed solution.
更多
查看译文
关键词
Machine learning, Text mining, Sentiment analysis, Imbalanced learning, Multi-class imbalance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要