Automatic Arabic Dialect Classification Using Deep Learning Models.

Procedia Computer Science(2018)

引用 65|浏览5
暂无评分
摘要
Recently, the vast use of social media and the high availability of internet access have produced a considerably different textual data from the formal and standard data on the Web. This includes various Arabic dialectal languages, which are the native spoken languages of Arabic speakers. The presence of textual Arabic dialectal languages on the Web has brought many new opportunities as well as challenges for machine learning and Arabic language processing. The identification of this type of informal data has its crucial effect on several applications such as sentiment analysis and machine translation. However, the standard NLP tools developed for traditional data fall short due to nature of dialectal textual data. Deep learning tools have proven to be very effective in processing social Media dialectal text. In this paper, we consider a variety of deep learning models for the automatic classification of Arabic dialectal text. We use a free large manually-annotated dataset known as Arabic Online Commentary (AOC), which includes several Dialectal Arabic (DA) along with the Modern Standard Arabic (MSA), [3]. We consider the most frequent dialects in the dataset. Namely, the Egyptian (EGP), Levantine (LEV), and Gulf –including Iraqi - (GLF). Four different deep neural network models have been implemented to examine the Arabic dialectal classification problem for each pair of the 3 dialects (binary classification experiments) as well as one ternary-classification experiment including all dialects together. The results show a varying but promising performance of the models for each pair of dialects. Furthermore, a closer examination on the manually-annotated AOC dataset has been carried out and hence, we conclude that there is a serious demand for a thorough refinement and review of the AOC annotated sentences as it is an important benchmark dataset in the field.
更多
查看译文
关键词
deep learning models,classification,Arabic dialects,AOC dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要