GIN: A Clustering Model for Capturing Dual Heterogeneity in Networked Data.
SDM(2015)
摘要
Networked data often consists of interconnected multityped nodes and links. A common assumption behind such heterogeneity is the shared clustering structure. However, existing network clustering approaches oversimplify the heterogeneity by either treating nodes or links in a homogeneous fashion, resulting in massive loss of information. In addition, these studies are more or less restricted to specific network schemas or applications, losing generality. In this paper, we introduce a flexible model to explain the process of forming heterogeneous links based on shared clustering information of heterogeneous nodes. Specifically, we categorize the link generation process into binary and weighted cases and model them respectively. We show these two cases can be seamlessly integrated into a unified model. We propose to maximize a joint log-likelihood function to infer the model efficiently with Expectation Maximization (EM) algorithms. Experiments on real-world networked data sets demonstrate the effectiveness and flexibility of the proposed method in fully capturing the dual heterogeneity of both nodes and links.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络