A Multilingual Evaluation for Online Hate Speech Detection

ACM Transactions on Internet Technology(2020)

引用 104|浏览86
暂无评分
摘要
The increasing popularity of social media platforms such as Twitter and Facebook has led to a rise in the presence of hate and aggressive speech on these platforms. Despite the number of approaches recently proposed in the Natural Language Processing research area for detecting these forms of abusive language, the issue of identifying hate speech at scale is still an unsolved problem. In this article, we propose a robust neural architecture that is shown to perform in a satisfactory way across different languages; namely, English, Italian, and German. We address an extensive analysis of the obtained experimental results over the three languages to gain a better understanding of the contribution of the different components employed in the system, both from the architecture point of view (i.e., Long Short Term Memory, Gated Recurrent Unit, and bidirectional Long Short Term Memory) and from the feature selection point of view (i.e., ngrams, social network–specific features, emotion lexica, emojis, word embeddings). To address such in-depth analysis, we use three freely available datasets for hate speech detection on social media in English, Italian, and German.
更多
查看译文
关键词
Hate speech detection,multilingual data,social media,text classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要