LSIF: A System for Large-Scale Information Flow Detection Based on Topic-Related Semantic Similarity Measurement.

WI-IAT(2015)

引用 9|浏览55
暂无评分
摘要
Information flow detection is dedicated to tracking the dynamics and evolution of Web information spreading across the entire web over time. How to choose a comfortable information granularity to detect and how to track information evolution from one to another are the main challenges. Besides, the technological problem of doing that with a large scale information efficiently is yet to be solved. In this paper, we propose a system approach (LSIF) for a large-scale topic-related semantic information flow detection. We view the sentence as the basic information unit. Moreover, we represent a word or a sentence as continuous high-dimensional vector, which is used for semantic similarity measurement, with the help of word embedding and Fisher kernel. To handle the large-scale information efficiently, we propose a dimension reduction framework called Random Reference Reduction (3R). Furthermore, we adopt a novel clustering algorithm to extract meme -- a piece of information and its variants and analyze how memes evolve. We demonstrate the effectiveness of our approach on two terabyte-level datasets. One is the dataset used by some previous researchers, on which we conducted a series of experiments to evaluate performance. The result shows that our approach is more effective and more efficient comparing with the state-of-the-art methods. The other one is 5 terabyte dataset crawled from 20 Chinese news sites. We visualize the detection results of information flow and exact 9 million memes from the Chinese dataset, which spend about two days.
更多
查看译文
关键词
information flow,semantic similarity,dimension reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要