Enhancing Sentiment Classification in Twitter Data Through Context-Driven Text Processing and Tweet Embeddings

Vassilis C. Gerogiannis,Andreas Kanavos, Nikos Antonopoulos, Amrita Bhola, Biswaranjan Acharya

2023 IEEE 11th Region 10 Humanitarian Technology Conference (R10-HTC)(2023)

引用 0|浏览0
Sentiment analysis and text classification tasks heav-ily rely on text processing techniques. However, existing approaches often neglect domain-specific factors and rely on generic routines and pre-built dictionaries. In this paper, we investigate the impact of text processing steps on sentiment classification using Twitter data. Our approach introduces skip gram-based word embeddings that effectively capture Twitter-specific fea-tures, such as informal language and emojis. Through rigorous experimentation, we identify the detrimental consequences of conventional text processing steps like stop word removal and simple averaging of term vectors for tweet representation. To optimize sentiment classification, we propose new effective steps, including the inclusion of emoji characters, measuring word importance from embeddings, aggregating term vectors into tweet embeddings, and creating a linearly separable feature space. Our results demonstrate the superiority of context-driven word embeddings in selecting important words for tweet clas-sification, outperforming pre-built word dictionaries. Moreover, the proposed tweet embedding reduces reliance on multiple text processing steps, resulting in more accurate sentiment analysis on Twitter data.
Text processing,Tweet classification,Sentiment classification,Word embeddings,Feature extraction,Machine learning
AI 理解论文
Chat Paper