A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter

BICS(2018)

引用 4|浏览0
暂无评分
摘要
In the literature, limited work has been conducted to develop sentiment resources for Saudi dialect. The lack of resources such as dialectical lexicons and corpora are some of the major bottlenecks to the successful development of Arabic sentiment analysis models. In this paper, a semi-supervised approach is presented to construct an annotated sentiment corpus for Saudi dialect using Twitter. The presented approach is primarily based on a list of lexicons built by using word embedding techniques such as word2vec. A huge corpus extracted from twitter is annotated and manually reviewed to exclude incorrect annotated tweets which is publicly available. For corpus validation, state-of-the-art classification algorithms (such as Logistic Regression, Support Vector Machine, and Naive Bayes) are applied and evaluated. Simulation results demonstrate that the Naive Bayes algorithm outperformed all other approaches and achieved accuracy up to 91%.
更多
查看译文
关键词
Sentiment analysis, Saudi dialect, Word embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要