Analyzing the Influence of Language Model-Generated Responses in Mitigating Hate Speech on Social Media Directed at Ukrainian Refugees in Poland
CoRR(2023)
摘要
In the context of escalating hate speech and polarization on social media,
this study investigates the potential of employing responses generated by Large
Language Models (LLM), complemented with pertinent verified knowledge links, to
counteract such trends. Through extensive A/B testing involving the posting of
753 automatically generated responses, the goal was to minimize the propagation
of hate speech directed at Ukrainian refugees in Poland.
The results indicate that deploying LLM-generated responses as replies to
harmful tweets effectively diminishes user engagement, as measured by
likes/impressions. When we respond to an original tweet, i.e., which is not a
reply, we reduce the engagement of users by over 20\% without increasing the
number of impressions. On the other hand, our responses increase the ratio of
the number of replies to a harmful tweet to impressions, especially if the
harmful tweet is not original. Additionally, the study examines how generated
responses influence the overall sentiment of tweets in the discussion,
revealing that our intervention does not significantly alter the mean
sentiment.
This paper suggests the implementation of an automatic moderation system to
combat hate speech on social media and provides an in-depth analysis of the A/B
experiment, covering methodology, data collection, and statistical outcomes.
Ethical considerations and challenges are also discussed, offering guidance for
the development of discourse moderation systems leveraging the capabilities of
generative AI.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要