Multi-Source Collection Of Event-Labeled News Documents
PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19)(2019)
摘要
In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics.The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging news-mining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.
更多查看译文
关键词
test collections, news streams, event detection and analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络