Text Network Exploration via Heterogeneous Web of Topics

2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW)(2016)

引用 5|浏览46
暂无评分
摘要
A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges. The proliferation of text networks such as hyperlinked webpages and academic citation networks has led to an increasing demand for quickly developing a general sense of a new text network, namely text network exploration. In this paper, we address the problem of text network exploration through constructing a heterogeneous web of topics, which allows people to investigate a text network associating word level with document level. To achieve this, a probabilistic generative model for text and links is proposed, where three different relationships in the heterogeneous topic web are quantified. We also develop a prototype demo system named TopicAtlas to exhibit such heterogeneous topic web, and demonstrate how this system can facilitate the task of text network exploration. Extensive qualitative analyses are included to verify the effectiveness of this heterogeneous topic web. Besides, we validate our model on real-life text networks, showing that it preserves good performance on objective evaluation metrics.
更多
查看译文
关键词
text network,heterogeneous topic web,generative model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要