Generative Deduplication For Socia Media Data Selection
CoRR(2024)
摘要
Social media data is plagued by the redundancy problem caused by its noisy
nature, leading to increased training time and model bias. To address this
issue, we propose a novel approach called generative duplication. It aims to
remove duplicate text from noisy social media data and mitigate model bias. By
doing so, it can improve social media language understanding performance and
save training time. Extensive experiments demonstrate that the proposed
generative deduplication can effectively reduce training samples while
improving performance. This evidence suggests the effectiveness of generative
deduplication and its importance in social media language understanding.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要