# ProjE: Embedding Projection for Knowledge Graph Completion

AAAI, 2017.

EI

Keywords:

Recurrent Neural NetworkStructured Embeddingtriple hprojection embedding modelConvolution Neural NetworkMore(6+)

Weibo:

Abstract:

With the large volume of new information created every day, determining the validity of information in a knowledge graph and filling in its missing parts are crucial tasks for many researchers and practitioners. To address this challenge, a number of knowledge graph completion methods have been developed using low-dimensional graph embedd...More

Code:

Data:

Introduction

- Knowledge Graphs (KGs) have become a crucial resource for many tasks in machine learning, data mining, and artificial intelligence applications including question answering (Unger et al 2012), entity disambiguation (Cucerzan 2007), named entity linking (Hachey et al 2013), fact checking (Shi and Weninger 2016), and link prediction (Nickel et al 2015) to name a few.
- It is necessary to develop knowledge graph completion (KGC) methods to find missing or errant relationships with the goal of improving the general quality of KGs, which, in turn, can be used to improve or create interesting downstream applications.
- (Entity Ranking Problem) Given a Knowledge Graph G = {E, R} and an input triple h, r, ?
- The entity ranking problem attempts to find the optimal ordered list such that ∀ej∀ei ( → ei ≺ ej), where E+ = {e ∈ {e1, e2, .

Highlights

- Knowledge Graphs (KGs) have become a crucial resource for many tasks in machine learning, data mining, and artificial intelligence applications including question answering (Unger et al 2012), entity disambiguation (Cucerzan 2007), named entity linking (Hachey et al 2013), fact checking (Shi and Weninger 2016), and link prediction (Nickel et al 2015) to name a few
- When discussing the details of the present work, we focus on the entity prediction task; it is straightforward to adapt the methodology to the relationship prediction task by changing the input
- We evaluate the projection embedding model model with entity prediction and relationship prediction tasks, and compare the performance against several existing methods using experimental procedures, datasets, and metrics established in the related work
- Softmax is usually used in mutually exclusive multi-class classification problems and sigmoid is a more natural choice for non-exclusive cases like the knowledge graph completion task, we find that ProjE_listwise and ProjE_wlistwise perform better than ProjE_pointwise in most cases
- The contributions of the present work are as follows: 1) we view the knowledge graph completion task as a ranking problem and project candidate-entities onto a vector representing a combined embedding of the known parts of an input triple and order the ranking score vector in descending order; 2) we show that by optimizing the ranking score vector collectively using the listwise projection embedding model variation, we can significantly improve prediction performance; 3) projection embedding model uses only directly connected, length-1 paths during training, and has a relatively simple 2-layer structure, yet outperforms complex models that have a richer parameter or feature set; and 4) unlike other models (e.g., Compositional Vector Space Model, RTransE, DKRL), the present work does not require any pre-trained embeddings and has many fewer parameters than related models
- We show that projection embedding model can outperform existing methods on fact checking tasks

Methods

**Ranking Method and Loss Function**

As defined in Defn. 1, the authors view the KGC problem as a ranking task where all positive candidates precede all negative candidates and train the model .- Most existing KGC models, including TransE, TransR, TransH, and HolE use a pairwise ranking loss function during training, their ranking score is calculated independently in what is essentially a pointwise method when deployed.
- Because the authors maximize the likelihood between the ranking score vector h(e, r) and the binary label vector, it is intuitive to view this task as a multiclass classification problem.
- The loss function of ProjE_pointwise can be defined in a familiar way: L(e, r, y) = −

Results

- Researchers continue to improve these models using an increasingly complex feature space, the authors show that simple changes in the architecture of the underlying model can outperform state-of-the-art models without the need for complex feature engineering.
- ProjE has a parameter size that is smaller than 11 out of 15 existing methods while performing 37% better than the current-best method on standard datasets

Conclusion

- The contributions of the present work are as follows: 1) the authors view the KGC task as a ranking problem and project candidate-entities onto a vector representing a combined embedding of the known parts of an input triple and order the ranking score vector in descending order; 2) the authors show that by optimizing the ranking score vector collectively using the listwise ProjE variation, the authors can significantly improve prediction performance; 3) ProjE uses only directly connected, length-1 paths during training, and has a relatively simple 2-layer structure, yet outperforms complex models that have a richer parameter or feature set; and 4) unlike other models (e.g., CVSM, RTransE, DKRL), the present work does not require any pre-trained embeddings and has many fewer parameters than related models.
- The authors plan to use information from complex paths in the KG to clearly summarize the many complicated ways in which entities are connected

Summary

## Introduction:

Knowledge Graphs (KGs) have become a crucial resource for many tasks in machine learning, data mining, and artificial intelligence applications including question answering (Unger et al 2012), entity disambiguation (Cucerzan 2007), named entity linking (Hachey et al 2013), fact checking (Shi and Weninger 2016), and link prediction (Nickel et al 2015) to name a few.- It is necessary to develop knowledge graph completion (KGC) methods to find missing or errant relationships with the goal of improving the general quality of KGs, which, in turn, can be used to improve or create interesting downstream applications.
- (Entity Ranking Problem) Given a Knowledge Graph G = {E, R} and an input triple h, r, ?
- The entity ranking problem attempts to find the optimal ordered list such that ∀ej∀ei ( → ei ≺ ej), where E+ = {e ∈ {e1, e2, .
## Methods:

**Ranking Method and Loss Function**

As defined in Defn. 1, the authors view the KGC problem as a ranking task where all positive candidates precede all negative candidates and train the model .- Most existing KGC models, including TransE, TransR, TransH, and HolE use a pairwise ranking loss function during training, their ranking score is calculated independently in what is essentially a pointwise method when deployed.
- Because the authors maximize the likelihood between the ranking score vector h(e, r) and the binary label vector, it is intuitive to view this task as a multiclass classification problem.
- The loss function of ProjE_pointwise can be defined in a familiar way: L(e, r, y) = −
## Results:

Researchers continue to improve these models using an increasingly complex feature space, the authors show that simple changes in the architecture of the underlying model can outperform state-of-the-art models without the need for complex feature engineering.- ProjE has a parameter size that is smaller than 11 out of 15 existing methods while performing 37% better than the current-best method on standard datasets
## Conclusion:

The contributions of the present work are as follows: 1) the authors view the KGC task as a ranking problem and project candidate-entities onto a vector representing a combined embedding of the known parts of an input triple and order the ranking score vector in descending order; 2) the authors show that by optimizing the ranking score vector collectively using the listwise ProjE variation, the authors can significantly improve prediction performance; 3) ProjE uses only directly connected, length-1 paths during training, and has a relatively simple 2-layer structure, yet outperforms complex models that have a richer parameter or feature set; and 4) unlike other models (e.g., CVSM, RTransE, DKRL), the present work does not require any pre-trained embeddings and has many fewer parameters than related models.- The authors plan to use information from complex paths in the KG to clearly summarize the many complicated ways in which entities are connected

- Table1: Parameter size and prerequisites of KGC models in increasing order. ProjE, ranked 5th, is highlighted. ne, nr, nw, k are the number of entities, relationships, words, and embedding size in the KG respectively. z is the hidden layer size. q† represents the number of RNN parameters in RTransE; this value is not specified, but should be 8k2 if a normal LSTM is used
- Table2: Entity prediction on FB15K dataset. Missing values indicate scores not reported in the original work
- Table3: Relationship prediction on FB15K dataset
- Table4: AUC scores of fact checking test cases on DBpedia and SemMedDB

Related work

- A variety of low-dimensional representation-based methods have been developed to work on the KGC task. These methods usually learn continuous, low-dimensional vector representations (i.e., embeddings) for entities WE and relationships WR by minimizing a margin-based pairwise ranking loss (Lin, Liu, and Sun 2015).

The most widely used embedding model in this category is TransE (Bordes et al 2013), which views relationships as translations from a head entity to a tail entity on the same low-dimensional plane. The energy function of TransE is defined as

E(h, r, t) = h + r − t Ln , (1)

which measures the Ln-distance between a translated head entity h+r and some tail entity t. The Unstructured model (Bordes et al 2012) is a special case of TransE where r = 0 for all relationships.

Based on the initial idea of treating two entities as a translation of one another (via their relationship) in the same embedding plane, several models have been introduced to improve the initial TransE model. The newest contributions in this line of work focus primarily on the changes in how the embedding planes are computed and/or how the embeddings are combined. For example, the entity translations in TransH (Wang et al 2014) are computed on a hyperplane that is perpendicular to the relationship embedding. In TransR (Lin et al 2015) the entities and relationships are embedded on separate planes and then the entity-vectors are translated to the relationship’s plane. Structured Embedding (SE) (Bordes et al 2011) creates two translation matrices for each relationship and applies them to head and tail entities separately. Knowledge Vault (Dong et al 2014) and HolE (Nickel, Rosasco, and Tomaso 2016), on the other hand, focus on learning a new combination operator instead of simply adding two entity embeddings element-wise.

Funding

- Although researchers continue to improve these models using an increasingly complex feature space, we show that simple changes in the architecture of the underlying model can outperform state-of-the-art models without the need for complex feature engineering
- In doing so, ProjE has a parameter size that is smaller than 11 out of 15 existing methods while performing 37% better than the current-best method on standard datasets

Reference

- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M.; et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
- Adamic, L. A., and Adar, E. 2003. Friends and neighbors on the Web. Social Networks 25(3):211–230.
- Bordes, A.; Weston, J.; Collobert, R.; and Bengio, Y. 2011. Learning Structured Embeddings of Knowledge Bases. AAAI.
- Bordes, A.; Glorot, X.; Weston, J.; and Bengio, Y. 2012. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing. AISTATS 127–135.
- Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS, 2787–2795.
- Ciampaglia, G. L.; Shiralkar, P.; Rocha, L. M.; Bollen, J.; Menczer, F.; and Flammini, A. 2015. Computational fact checking from knowledge networks. PLoS ONE 10(6).
- Cucerzan, S. 200Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, volume 7, 708– 716.
- Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; and Zhang, W. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In SIGKDD.
- Galárraga, L. A.; Teflioudi, C.; Hose, K.; and Suchanek, F. 2013. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 413–422.
- García-Durán, A.; Bordes, A.; and Usunier, N. 2015. Composing Relationships with Translations. EMNLP 286–290.
- Gong, Y.; Jia, Y.; Leung, T.; Toshev, A.; and Ioffe, S. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894.
- Guillaumin, M.; Mensink, T.; Verbeek, J.; and Schmid, C. 2009. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV, 309– 316. IEEE.
- Gutmann, M., and Hyvärinen, A. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS, 297–304.
- Hachey, B.; Radford, W.; Nothman, J.; Honnibal, M.; and Curran, J. R. 2013. Evaluating entity linking with wikipedia. AI 194:130–150.
- Haveliwala, T. H. 2002. Topic-sensitive pagerank. In WWW, 517–526.
- Jean, S.; Cho, K.; Memisevic, R.; and Bengio, Y. 2015. On Using Very Large Target Vocabulary for Neural Machine Translation. ACL 1–10.
- Jeh, G., and Widom, J. 2002. SimRank: a measure of structural context similarity. In KDD, 538–543.
- Jenatton, R.; Le Roux, N.; Bordes, A.; and Obozinski, G. 2012. A latent factor model for highly multi-relational data. NIPS 3176–3184.
- Jia, Y.; Wang, Y.; Lin, H.; Jin, X.; and Cheng, X. 2016. Locally Adaptive Translation for Knowledge Graph Embedding. In AAAI.
- Kilicoglu, H.; Shin, D.; Fiszman, M.; Rosemblat, G.; and Rindflesch, T. C. 2012. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23):3158–3160.
- Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Lao, N., and Cohen, W. W. 2010. Relational retrieval using a combination of path-constrained random walks. ML 81(1):53–67.
- Lehmann, J.; Isele, R.; and Jakob, M. 2014. DBpedia a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 5(1):167–195.
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI, 2181–2187.
- Lin, Y.; Liu, Z.; and Sun, M. 2015. Modeling Relation Paths for Representation Learning of Knowledge Bases. EMNLP 705–714.
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In NIPS.
- Neelakantan, A.; Roth, B.; and McCallum, A. 2015. Compositional Vector Space Models for Knowledge Base Inference. AAAI.
- Nickel, M.; Murphy, K.; Tresp, V.; and Gabrilovich, E. 2015. A review of relational machine learning for knowledge graphs: From multi-relational link prediction to automated knowledge graph construction. arXiv preprint arXiv:1503.00759.
- Nickel, M.; Rosasco, L.; and Tomaso, P. 2016. Holographic Embeddings of Knowledge Graphs. In AAAI.
- Nickel, M.; Tresp, V.; and Kriegel, H.-P. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In ICML, 809–816.
- Pareek, H. H., and Ravikumar, P. K. 2014. A representation theory for ranking functions. In NIPS, 361–369.
- Shi, B., and Weninger, T. 2016. Fact checking in heterogeneous information networks. In WWW, 101–102.
- Socher, R.; Chen, D.; Manning, C. D.; and Ng, A. Y. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NIPS, 926–934.
- Unger, C.; Bühmann, L.; Lehmann, J.; Ngonga Ngomo, A.C.; Gerber, D.; and Cimiano, P. 2012. Template-based question answering over rdf data. In WWW, 639–648.
- Wang, Z.; Zhang, J.; Feng, J.; and Chen, Z. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. AAAI 1112–1119.
- Xie, R.; Liu, Z.; Jia, J.; Luan, H.; and Sun, M. 2016. Representation Learning of Knowledge Graphs with Entity Descriptions. AAAI 189–205.
- Zhou, T.; Ren, J.; Medo, M.; and Zhang, Y.-C. 2007. Bipartite network projection and personal recommendation. Phys. Rev. E 76(4):046–115.

Tags

Comments