WokeGPT: Improving Counterspeech Generation Against Online Hate Speech by Intelligently Augmenting Datasets Using a Novel Metric.

IJCNN(2023)

引用 0|浏览8
暂无评分
摘要
With hate speech spreading rapidly online, it is increasingly important to respond automatically. However, there are some critical limitations in developing systems which produce these responses, which are known as counterspeeches. First, datasets containing paired instances of a hate speech and its appropriate response are very small. There is an abundance of hate speech on the web and in structured datasets, but quality counterspeeches are rare. Thus, since data is scarce, there is a need for automated methods to intelligently increase the size of existing paired datasets. Another critical challenge is that existing Natural Language Generation (NLG) metrics are not suitable for evaluating such systems, because these metrics do not accurately reflect how a human interprets the relationship between a hate speech and its counterspeech. Lastly, language models trained on internet text often exhibit a large amount of bias, which is unsuitable for sensitive tasks such as counterspeech generation. To address these challenges, we first introduce a technique to intelligently augment a small paired dataset of hate speech and counterspeech to make it substantially larger and varied, through a pairing technique that appropriately matches unpaired instances of hate speech with synthetic and existing counterspeeches. Next, we identified a need for a metric that evaluates counterspeech in the same way humans do, and propose a novel metric called PD-Score that leverages an advanced debating system. We empirically show through a large survey, that existing NLG metrics correlate poorly to human assessment and that our alternative is much more tightly bound to human assessment. Lastly, we curated a large domain-specific text corpus called WokeCorpus which we use to pretrain the language model before finetuning it for producing counterspeeches. We show that this both debiases the language model and aids performance.
更多
查看译文
关键词
Natural Language Generation,Counterspeech Generation,Hate Speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要