SCStory: Self-supervised and Continual Online Story Discovery
WWW 2023(2023)
摘要
We present a framework SCStory for online story discovery, that helps people
digest rapidly published news article streams in real-time without human
annotations. To organize news article streams into stories, existing approaches
directly encode the articles and cluster them based on representation
similarity. However, these methods yield noisy and inaccurate story discovery
results because the generic article embeddings do not effectively reflect the
story-indicative semantics in an article and cannot adapt to the rapidly
evolving news article streams. SCStory employs self-supervised and continual
learning with a novel idea of story-indicative adaptive modeling of news
article streams. With a lightweight hierarchical embedding module that first
learns sentence representations and then article representations, SCStory
identifies story-relevant information of news articles and uses them to
discover stories. The embedding module is continuously updated to adapt to
evolving news streams with a contrastive learning objective, backed up by two
unique techniques, confidence-aware memory replay and prioritized-augmentation,
employed for label absence and data scarcity problems. Thorough experiments on
real and the latest news data sets demonstrate that SCStory outperforms
existing state-of-the-art algorithms for unsupervised online story discovery.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要