GMMDA: Gaussian Mixture Modeling of Graph in Latent Space for Graph Data Augmentation

23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023(2023)

引用 0|浏览2
暂无评分
摘要
Graph data augmentation (GDA), which manipulates graph structure and/or attributes, has been demonstrated as an effective method for improving the generalization of graph neural networks on semi -supervised node classification. As a data augmentation technique, label -preservation is critical, that is, node labels should not change after data manipulation. However, most existing methods overlook the label-preservation requirements. Determining the label-preserving nature of a GDA method is highly challenging, owing to the non-Euclidean nature of the graph structure. In this study, for the first time, we formulate a label-preserving problem (LPP) in the context of GDA. The LPP is formulated as an optimization problem in which, given a fixed augmentation budget, the objective is to find an augmented graph with minimal difference in data distribution compared to the original graph. To solve the LPP problem, we propose GMMDA, a generative data augmentation (DA) method based on Gaussian mixture modeling (GMM) of a graph in a latent space. The proposed GMMDA has three phases. First, a novel objective is designed to jointly learn a low-dimensional graph representation and estimate the GMM. The learning is followed by sampling from the GMM, and then the samples are converted hack to the graph as additional nodes. To uphold label preservation, we designed a minimum description length (MDL)-based method to select a set of samples that produces the minimum shift in the data distribution. Through experiments, we demonstrate that GMMDA can improve the performance of graph convolutional network on CORA, CITESEER and MIMED by as much as 7.75% 8.75% and 5.87%, respectively, significantly outperforming the state-of-the-art methods.
更多
查看译文
关键词
graph data augmentation,graph neural networks,Gaussian mixture model,semi-supervised learning,minimum description length principle
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要