Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment

empirical methods in natural language processing, pp. 6355-6364, 2020.

Cited by: 0|Bibtex|Views73|Links
Keywords:
experimental settingDBpedia and YAGOentity alignmentea methodequivalent entityMore(12+)
Weibo:
We propose a novel Entity Alignment model and contribute a hard experimental setting for practical evaluation

Abstract:

Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well ...More

Code:

Data:

0
Introduction
  • The prosperity of data mining has spawned Knowledge Graphs (KGs) in many domains that are often complementary to each other.
  • Most of the previous EA models (Sun et al, 2017; Wang et al, 2018; Wu et al, 2019a) rely on the structure assumption that, the adjacencies of two equivalent entities in KGs usually contain equivalent entities (Wang et al, 2018) (see Figure 1(a))
  • These models mainly focus on modeling KG structure defined by the relation triples.
  • The authors have identified the challenges of attribute incorporation and dataset bias
Highlights
  • The prosperity of data mining has spawned Knowledge Graphs (KGs) in many domains that are often complementary to each other
  • A KG comprises a set of triples, with each triple consisting of a subject, predicate, and object
  • Most of the previous Entity Alignment (EA) models (Sun et al, 2017; Wang et al, 2018; Wu et al, 2019a) rely on the structure assumption that, the adjacencies of two equivalent entities in KGs usually contain equivalent entities (Wang et al, 2018) (see Figure 1(a)). These models mainly focus on modeling KG structure defined by the relation triples
  • To address the first challenge, we propose Attributed Graph Neural Network (AttrGNN) to learn attribute triples and relation triples in a unified network, and learn importance of each attributes and values dynamically
  • We report the results in two settings: regular setting, i.e., the setting used in the previous entity alignment works; and hard setting, where we construct a harder test set for objective evaluation
  • Our full model significantly outperforms the Structure channel and the A w/o relation, which are the models with only relation/attribute features
  • We propose a novel EA model (AttrGNN) and contribute a hard experimental setting for practical evaluation
Methods
  • The authors compute the similarity matrix S via NameBERT; each element Se,e denotes the similarity between the entity pair e ∈ Es and e ∈ Es. Second, the authors sort each row of S in descending order, by ranking (e, e ) higher when there is less similarity in their names.
  • The authors pick the highest-ranked 60% of equivalent entity pairs as the test set.
  • The authors construct harder test set for the cross-lingual dataset only, because it is impractical to find equivalent entity pairs whose entities have very different names on the monolingual dataset, as shown by the performance of NameBERT in Table
Results
  • The authors' full model significantly outperforms the Structure channel and the A w/o relation, which are the models with only relation/attribute features.
Conclusion
  • The authors implement AttrGNN and eight best-performed baselines with their source codes on the hard setting.
  • The authors observe general performance drop in Hit@1 on DBP15k for all models, as shown in Figure 3.
  • AttrGNN still achieves the best performance, demonstrating the effectiveness of the model.
  • The performance of AttrGNN has degraded by around 6% for Hits@1
  • This degradation indicates that the practical application of EA is still challenging and worth exploration.Conclusion and Future Work.
  • BERT with knowledge enhanced and number sensitive text representations models (Cao et al, 2017; Geva et al, 2020)
Summary
  • Introduction:

    The prosperity of data mining has spawned Knowledge Graphs (KGs) in many domains that are often complementary to each other.
  • Most of the previous EA models (Sun et al, 2017; Wang et al, 2018; Wu et al, 2019a) rely on the structure assumption that, the adjacencies of two equivalent entities in KGs usually contain equivalent entities (Wang et al, 2018) (see Figure 1(a))
  • These models mainly focus on modeling KG structure defined by the relation triples.
  • The authors have identified the challenges of attribute incorporation and dataset bias
  • Objectives:

    The equivalent entities e and e share the attribute Area with similar values of 153, 909 and 154, 077. the authors aim to improve EA using attribute triples.
  • The equivalent entities e and e share the attribute Area with similar values of 153, 909 and 154, 077.
  • The authors aim to improve EA using attribute triples.
  • The authors aim to carry out a more objective evaluation of EA models on a harder test set
  • Methods:

    The authors compute the similarity matrix S via NameBERT; each element Se,e denotes the similarity between the entity pair e ∈ Es and e ∈ Es. Second, the authors sort each row of S in descending order, by ranking (e, e ) higher when there is less similarity in their names.
  • The authors pick the highest-ranked 60% of equivalent entity pairs as the test set.
  • The authors construct harder test set for the cross-lingual dataset only, because it is impractical to find equivalent entity pairs whose entities have very different names on the monolingual dataset, as shown by the performance of NameBERT in Table
  • Results:

    The authors' full model significantly outperforms the Structure channel and the A w/o relation, which are the models with only relation/attribute features.
  • Conclusion:

    The authors implement AttrGNN and eight best-performed baselines with their source codes on the hard setting.
  • The authors observe general performance drop in Hit@1 on DBP15k for all models, as shown in Figure 3.
  • AttrGNN still achieves the best performance, demonstrating the effectiveness of the model.
  • The performance of AttrGNN has degraded by around 6% for Hits@1
  • This degradation indicates that the practical application of EA is still challenging and worth exploration.Conclusion and Future Work.
  • BERT with knowledge enhanced and number sensitive text representations models (Cao et al, 2017; Geva et al, 2020)
Tables
  • Table1: Triple numbers of datasets. #Relation indicates the number of relation triples. The numbers of attribute triples that have digital values and literal values are denoted by #Digital and #Literal
  • Table2: Characteristics of entity alignment models. The top part lists 8 models without utilizing entity names, and the bottom part lists 5 models with entity names. Attr and Value indicate the attributes and values from attribute triples; Name indicates entity names; and Iter indicates whether the model iteratively enlarge training set of equivalent entities
  • Table3: Overall performance on the regular setting of DBP15k. Models in the first part do not use Names while models in the second part use Name. * indicates results from our re-implementation using their source code
  • Table4: Overall performance on DWY100K. The performance of AttrE is reported in <a class="ref-link" id="cZhang_et+al_2019_a" href="#rZhang_et+al_2019_a">Zhang et al (2019</a>)
  • Table5: Overall performance on the hard setting of DBP15k
  • Table6: Attributes and values for the entity “Georgia (U.S state)” from the English and Chinese DBpedia. Attributes are sorted in descending order according to the attention score. Chinese texts are translated
Download tables as Excel
Related work
  • Recent entity alignment methods can be classified into embedding-based methods and Graph Neural Network-based (GNN-based) methods.

    2.1 Embedding-based Methods

    Recent works utilize KG embedding methods, such as TransE (Bordes et al, 2013), to model the relation triples and further unifies two KG embedding spaces by forcing seeds to be close (Chen et al, 2017). Attribute triples has been introduced in this field. JAPE (Sun et al, 2017) computes attribute similarity to regularize the structure-based optimization. KDCoE (Chen et al, 2018) cotrains entity description and structure embeddings with a shared iteratively enlarged seed set. At- KG2 KG1

    Graph Partiti on Graph Partition GC3 GCG3 C2 GCG2 C1

    GC1 Value Encoder GCG3
Funding
  • This research is supported by the National Research Foundation, Singapore under its International Research Centres in Singapore Funding Initiative
Study subjects and analysis
negative samples: 25
For each entity, we choose maximum 20 or 3 attribute triples based on GPU memory. For Graph Alignment, we choose 25 negative samples for each entity. We use 16 negative samples for each positive sample in the SVM ensemble model

negative samples: 16
For Graph Alignment, we choose 25 negative samples for each entity. We use 16 negative samples for each positive sample in the SVM ensemble model. We grid search the best parameters for each GNN channel on the valid set (if available) in the following range: learning rate {0.001, 0.004, 0.007}, L2 regularization {10−4, 10−3, 0}

cross-lingual datasets: 3
We test models on both cross-lingual and monolingual datasets: DBP15k (Sun et al, 2017) and DWY100k (Sun et al, 2018). DBP15k includes three cross-lingual datasets collected from DBpedia: Chinese and English (DBPZH-EN), Japanese and English (DBPJA-EN), French and English (DBPFR-EN). DWY100k contains two monolingual datasets: DBpedia and Wikidata (DBPWD), DBpedia and YAGO (DBP-YG)

major observations: 3
The overall performance is similar to that on DBP15k, on which AttrGNN achieves the best performance. There are three major observations: 1. NameBERT achieves nearly 100% Hits@1 on DBP-YG, which shows more severe name-bias than that on the cross-lingual dataset

major observations: 3
We observe general performance drop in Hit@1 on DBP15k for all models, as shown in Figure 3. There are three major observations: 1. AttrGNN still achieves the best performance, demonstrating the effectiveness of our model

major observations: 3
A w/o Relation is to ensemble NameBERT and one-layer Literal and Digital channels. There are three major observations: 1. The Literal and Structure channels’ performances are close to the Name channel under the hard setting

Reference
  • Daniel Andor, Luheng He, Kenton Lee, and Emily Pitler. 2019. Giving bert a calculator: Finding operations and arguments with reading comprehension. In EMNLP.
    Google ScholarFindings
  • Antoine Bordes, Nicolas Usunier, Alberto GarciaDuran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multirelational data. In NIPS.
    Google ScholarFindings
  • Yixin Cao, Lei Hou, Juanzi Li, and Zhiyuan Liu. 2018. Neural collective entity linking. In COLING.
    Google ScholarLocate open access versionFindings
  • Yixin Cao, Lifu Huang, Heng Ji, Xu Chen, and Juanzi Li. 2017. Bridge text and knowledge by learning multi-prototype entity mention embedding. In ACL.
    Google ScholarFindings
  • Yixin Cao, Zhiyuan Liu, Chengjiang Li, Juanzi Li, and Tat-Seng Chua. 2019a. Multi-channel graph neural network for entity alignment. In ACL.
    Google ScholarFindings
  • Yixin Cao, Xiang Wang, Xiangnan He, Zikun Hu, and Tat-Seng Chua. 2019b. Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences. In WWW.
    Google ScholarFindings
  • Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In IJCAI.
    Google ScholarFindings
  • Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2017. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In IJCAI.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 201Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR.
    Google ScholarLocate open access versionFindings
  • Michael R Garey and David S Johnson. 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co.
    Google ScholarFindings
  • Mor Geva, Ankit Gupta, and Jonathan Berant. 2020. Injecting numerical reasoning skills into language models. In ACL.
    Google ScholarFindings
  • Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS.
    Google ScholarFindings
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
    Google ScholarFindings
  • Thomas N Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In ICLR.
    Google ScholarFindings
  • Shantanu Kumar. 2017. A survey of deep learning methods for relation extraction. arXiv preprint.
    Google ScholarFindings
  • Chengjiang Li, Yixin Cao, Lei Hou, Jiaxin Shi, Juanzi Li, and Tat-Seng Chua. 2019. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In EMNLP.
    Google ScholarFindings
  • Zequn Sun, Wei Hu, and Chengkai Li. 2017. Cross-lingual entity alignment via joint attributepreserving embedding. In ISWC.
    Google ScholarFindings
  • Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping entity alignment with knowledge graph embedding. In IJCAI.
    Google ScholarFindings
  • Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters.
    Google ScholarFindings
  • Bayu Distiawan Trisedya, Jianzhong Qi, and Rui Zhang. 2019. Entity alignment between knowledge graphs using attribute embeddings. In AAAI.
    Google ScholarFindings
  • Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.
    Google ScholarFindings
  • Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP.
    Google ScholarFindings
  • Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, and Dongyan Zhao. 2019a. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI.
    Google ScholarFindings
  • Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, and Dongyan Zhao. 2019b. Jointly learning entity and relation representations for entity alignment. In EMNLP.
    Google ScholarFindings
  • Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, and Dongyan Zhao. 2020. Neighborhood matching network for entity alignment. In ACL.
    Google ScholarFindings
  • Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, and Dong Yu. 2019. Crosslingual knowledge graph alignment via graph matching neural network. In ACL.
    Google ScholarFindings
  • Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang. 2016. A short survey of recent advances in graph matching. In ICMR.
    Google ScholarFindings
  • Shuo Yang, Lei Zou, Zhongyuan Wang, Jun Yan, and Ji-Rong Wen. 2017. Efficiently answering technical questions—a knowledge graph approach. In AAAI.
    Google ScholarFindings
  • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In EMNLP.
    Google ScholarFindings
  • Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. 2019. Multi-view knowledge graph embedding for entity alignment. In IJCAI.
    Google ScholarFindings
  • Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative entity alignment via joint knowledge embeddings. In IJCAI.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments