Solving Social Media Text Classification Problems Using Code Fragment-Based XCSR.

ICTAI(2017)

引用 7|浏览4
暂无评分
摘要
Sentiment analysis and spam detection of social media text messages are two challenging data analysis tasks due to sparse and high-dimensional feature vectors. Learning classifier systems (LCS) are rule-based evolutionary computing systems and have limited capabilities to handle real valued sparse high-dimensional big data sets. LCS techniques use interval based representations to handle real valued feature vectors. In the work presented here, interval based representation is replaced by genetic programming based tree like structures to classify high-dimensional real valued text feature vectors. Multiple experiments are conducted on different social media text data sets, i.e. tweets, movie reviews, amazon and yelp reviews, SMS and Email spam message to evaluate the proposed scheme. Real valued feature vectors are generated from these data sets using term frequency inverse document frequency and/or sentiment lexicons-based features. Results depicts the supremacy of the new encoding scheme over interval based representations in both small and large social media text data sets.
更多
查看译文
关键词
Text Classification,Sentiment Analysis,Spam Detection,Learning Classifier Systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要