Language and dialect identification in social media analysis

Proceedings of SPIE(2014)

引用 0|浏览59
暂无评分
摘要
Historically-unwritten Arabic dialects are increasingly appearing online in social media texts and are often intermixed with other languages, including Modern Standard Arabic, English, and French. The next generation analyst will need new capabilities to quickly distinguish among the languages appearing in a given text and to identify informative patterns of language switching that occur within a user's social network patterns that may correspond to socio-cultural aspects such as participants' perceived and projected group identity. This paper presents work to (i) collect texts written in Moroccan Darija, a low-resource Arabic dialect from North Africa, and (ii) build an annotation tool that (iii) supports development of automatic language and dialect identification and (iv) provides social and information network visualizations of languages identified in tweet conversations.
更多
查看译文
关键词
Moroccan Arabic,dialectal Arabic,Romanized Arabic,language identification,annotation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要