Unsupervised Person Slot Filling based on Graph Mining

    Dian Yu
    Dian Yu

    ACL, 2016.

    Cited by: 17|Bibtex|Views10|Links
    EI
    Keywords:
    Chinese Slot Fillingslot typeslot fillerText Analysis Conference Knowledge Base PopulationCold Start Slot FillingMore(7+)
    Wei bo:
    Our approach outperforms state-of-the-art and can be rapidly portable to a new language or a new slot type, as long as there exists capabilities of name tagging, POS tagging, dependency parsing and trigger gazetteers

    Abstract:

    Slot filling aims to extract the values (slot fillers) of specific attributes (slots types) for a given entity (query) from a largescale corpus. Slot filling remains very challenging over the past seven years. We propose a simple yet effective unsupervised approach to extract slot fillers based on the following two observations: (1) a tri...More

    Code:

    Data:

    Introduction
    • The goal of the Text Analysis Conference Knowledge Base Population (TAC-KBP) Slot Filling (SF) task (McNamee and Dang, 2009; Ji et al, 2010; Ji et al, 2011; Surdeanu and Ji, 2014) is to extract the values of specific attributes for a given entity from a largescale corpus and provide justification sentences to support these slot fillers.
    • Given a person query “Dominich Dunne” and slot type spouse, a SF system may extract a slot filler “Ellen Griffin” and its justification sentence E1 as shown in Figure 1.
    • E1: Ellen Griffin Dunne, from whom he was divorced in 1965, died in 1997.
    • Ellen Griffin Dunne Person divorced nmod nsubjpass whom case from he coreference.
    • Considering any pair of query and candidate slot filler as an instance, these approaches train a classifier from manually labeled data through active learning (Angeli et al, 2014b) or noisy labeled data through distant supervision (Angeli et al, 2014a; Surdeanu et al, 2010) to predict the existence of a specific relation between them
    Highlights
    • The goal of the Text Analysis Conference Knowledge Base Population (TAC-KBP) Slot Filling (SF) task (McNamee and Dang, 2009; Ji et al, 2010; Ji et al, 2011; Surdeanu and Ji, 2014) is to extract the values of specific attributes for a given entity from a largescale corpus and provide justification sentences to support these slot fillers
    • Based on the released evaluation queries from KBP2015 Cold Start Slot Filling, our approach achieves 39.2% overall Fscore on 18 person trigger-driven slot types, which is significantly better than state-of-the-art (Angeli et al, 2015) on the same set of news documents (Table 4)
    • We demonstrate the importance of deep mining of dependency structures for slot filling
    • Our approach outperforms state-of-the-art and can be rapidly portable to a new language or a new slot type, as long as there exists capabilities of name tagging, POS tagging, dependency parsing and trigger gazetteers
    • In the future we aim to label slot types based on contextual information as well as sentence structures instead of trigger gazetteers only
    • A trigger can serve for multiple slot types
    Methods
    • Cy parser and (4) slot-specific trigger gazetteers, the authors can apply the framework to a new language.
    • The authors demonstrate the portability of the framework to Chinese since all the resources mentioned above are available.
    • The authors use the full set of Chinese trigger gazetteers published by Yu et al (2015).
    • Experimental results (Table 7) demonstrate that the approach can serve as a new and promising benchmark.
    • As far as the authors know, there are no results available for comparison
    Results
    • The authors' approach achieves 11.6%-25% higher F-score over state-ofthe-art English slot filling methods.
    • Based on the released evaluation queries from KBP2015 Cold Start Slot Filling, the approach achieves 39.2% overall Fscore on 18 person trigger-driven slot types, which is significantly better than state-of-the-art (Angeli et al, 2015) on the same set of news documents (Table 4).
    • The authors' approach outperforms state-of-the-art and can be rapidly portable to a new language or a new slot type, as long as there exists capabilities of name tagging, POS tagging, dependency parsing and trigger gazetteers
    Conclusion
    • Conclusions and Future

      Work

      In this paper, the authors demonstrate the importance of deep mining of dependency structures for slot filling.
    • The authors' approach outperforms state-of-the-art and can be rapidly portable to a new language or a new slot type, as long as there exists capabilities of name tagging, POS tagging, dependency parsing and trigger gazetteers.
    • In the future the authors aim to label slot types based on contextual information as well as sentence structures instead of trigger gazetteers only.
    • A trigger can serve for multiple slot types.
    • A trigger word can have multiple different meanings.
    • The authors attempt to combine multi-prototype approaches (e.g., (Reisinger and Mooney, 2010)) to better disambiguate senses of trigger words
    Summary
    • Introduction:

      The goal of the Text Analysis Conference Knowledge Base Population (TAC-KBP) Slot Filling (SF) task (McNamee and Dang, 2009; Ji et al, 2010; Ji et al, 2011; Surdeanu and Ji, 2014) is to extract the values of specific attributes for a given entity from a largescale corpus and provide justification sentences to support these slot fillers.
    • Given a person query “Dominich Dunne” and slot type spouse, a SF system may extract a slot filler “Ellen Griffin” and its justification sentence E1 as shown in Figure 1.
    • E1: Ellen Griffin Dunne, from whom he was divorced in 1965, died in 1997.
    • Ellen Griffin Dunne Person divorced nmod nsubjpass whom case from he coreference.
    • Considering any pair of query and candidate slot filler as an instance, these approaches train a classifier from manually labeled data through active learning (Angeli et al, 2014b) or noisy labeled data through distant supervision (Angeli et al, 2014a; Surdeanu et al, 2010) to predict the existence of a specific relation between them
    • Methods:

      Cy parser and (4) slot-specific trigger gazetteers, the authors can apply the framework to a new language.
    • The authors demonstrate the portability of the framework to Chinese since all the resources mentioned above are available.
    • The authors use the full set of Chinese trigger gazetteers published by Yu et al (2015).
    • Experimental results (Table 7) demonstrate that the approach can serve as a new and promising benchmark.
    • As far as the authors know, there are no results available for comparison
    • Results:

      The authors' approach achieves 11.6%-25% higher F-score over state-ofthe-art English slot filling methods.
    • Based on the released evaluation queries from KBP2015 Cold Start Slot Filling, the approach achieves 39.2% overall Fscore on 18 person trigger-driven slot types, which is significantly better than state-of-the-art (Angeli et al, 2015) on the same set of news documents (Table 4).
    • The authors' approach outperforms state-of-the-art and can be rapidly portable to a new language or a new slot type, as long as there exists capabilities of name tagging, POS tagging, dependency parsing and trigger gazetteers
    • Conclusion:

      Conclusions and Future

      Work

      In this paper, the authors demonstrate the importance of deep mining of dependency structures for slot filling.
    • The authors' approach outperforms state-of-the-art and can be rapidly portable to a new language or a new slot type, as long as there exists capabilities of name tagging, POS tagging, dependency parsing and trigger gazetteers.
    • In the future the authors aim to label slot types based on contextual information as well as sentence structures instead of trigger gazetteers only.
    • A trigger can serve for multiple slot types.
    • A trigger word can have multiple different meanings.
    • The authors attempt to combine multi-prototype approaches (e.g., (Reisinger and Mooney, 2010)) to better disambiguate senses of trigger words
    Tables
    • Table1: Dependency patterns for slot spouse
    • Table2: PPDB-based trigger expansion examples
    • Table3: English Slot Filling F1 (%) (KBP2013 SF data set)
    • Table4: English Cold Start Slot Filling F1 (%) (KBP2015 CSSF data set)
    • Table5: Examples for new slot types
    • Table6: The effect of trigger gazetteers on ESF (size: the number of triggers)
    • Table7: Chinese Slot Filling F1 (%) (KBP2015 CSF data set)
    Download tables as Excel
    Related work
    • Besides the methods based on distant supervision (e.g., (Surdeanu et al, 2010; Roth et al, 2013; Angeli et al, 2014b)) discussed in Section 6.2, pattern-based methods have also been proven to be effective in SF in the past years (Sun et al, 2011; Li et al, 2012; Yu et al, 2013). Dependency-based patterns achieve better performance since they can capture long-distance relations. Most of these approaches assume that a relation exists between Q and F if there is a dependency path connecting Q and F and all the words on the path are equally regarded as trigger candidates. We explore the complete graph structure of a sentence rather than chains/subgraphs as in previous work. Our previous research focused on identifying the relation between F and T by extracting filler candidates from the identified scope of a trigger (e.g., (Yu et al, 2015)). We found that each slot-specific trigger has its own scope, and corresponding fillers seldom appear outside its scope. We did not compare with results from this previous approach which did not consider redundancy removal required in the official evaluations.
    Funding
    • This work was supported by the DARPA LORELEI Program No HR0011-15-C0115, DARPA DEFT Program No FA8750-132-0041, ARL NS-CTA No W911NF-09-2-0053, NSF CAREER Award IIS-1523198
    Reference
    • G. Angeli, S. Gupta, M. Jose, C. Manning, C. Re, J. Tibshirani, J. Wu, S. Wu, and C. Zhang. 2014a. Stanford’s 2014 slot filling systems. In Proc. Text Analysis Conference (TAC 2014).
      Google ScholarLocate open access versionFindings
    • G. Angeli, J. Tibshirani, J. Wu, and C. Manning. 2014b. Combining distant and partial supervision for relation extraction. In Proc. Empirical Methods on Natural Language Processing (EMNLP 2014).
      Google ScholarLocate open access versionFindings
    • G. Angeli, V. Zhong, D. Chen, J. Bauer, A. Chang, V. Spitkovsky, and C. Manning. 2015. Bootstrapped self training for knowledge base population. In Proc. Text Analysis Conference (TAC 2015).
      Google ScholarLocate open access versionFindings
    • O. Bronstein, I. Dagan, Q. Li, H. Ji, and A. Frank. 2015. Seed-based event trigger labeling: How far can event descriptions get us? In Proc. Association for Computational Linguistics (ACL 2015).
      Google ScholarLocate open access versionFindings
    • W. Che, Z. Li, and T. Liu. 2010. Ltp: A chinese language technology platform. In Proc. Computational Linguistics (COLING 2010).
      Google ScholarLocate open access versionFindings
    • B. Frey and D. Dueck. 2007. Clustering by passing messages between data points. science.
      Google ScholarFindings
    • J. Ganitkevitch, B. Van Durme, and C. CallisonBurch. 2013. PPDB: The paraphrase database. In Proc. North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT 2013).
      Google ScholarLocate open access versionFindings
    • Y. Hong, X. Wang, Y. Chen, J. Wang, T. Zhang, J. Zheng, D. Yu, and Q. Li. 2014. Rpi blender tac-kbp2014 knowledge base population system. In Proc. Text Analysis Conference (TAC 2014).
      Google ScholarLocate open access versionFindings
    • G. Jeh and J. Widom. 2003. Scaling personalized web search. In Proc. World Wide Web (WWW 2003).
      Google ScholarLocate open access versionFindings
    • H. Ji, R. Grishman, H. Dang, K. Griffitt, and Joe Ellis. 20An overview of the tac2010 knowledge base population track. In Proc. Text Analysis Conference (TAC 2010).
      Google ScholarLocate open access versionFindings
    • H. Ji, R. Grishman, and H. Dang. 20An overview of the tac2011 knowledge base population track. In Proc. Text Analysis Conference (TAC 2011).
      Google ScholarLocate open access versionFindings
    • R. Levy and C. Manning. 2003. Is it harder to parse chinese, or the chinese treebank? In Proc. Association for Computational Linguistics (ACL 2003).
      Google ScholarLocate open access versionFindings
    • Y. Li, S. Chen, Z. Zhou, J. Yin, H. Luo, L. Hong, W. Xu, G. Chen, and J. Guo. 2012. Pris at tac2012 kbp track. In Proc. Text Analysis Conference (TAC 2012).
      Google ScholarLocate open access versionFindings
    • C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky. 20The Stanford CoreNLP natural language processing toolkit. In Proc. Association for Computational Linguistics (ACL 2014).
      Google ScholarLocate open access versionFindings
    • P. McNamee and H. Dang. 2009. Overview of the tac 2009 knowledge base population track. In Proc. Text Analysis Conference (TAC 2009).
      Google ScholarLocate open access versionFindings
    • R. Motwani and P. Raghavan. 1996. Randomized algorithms. ACM Computing Surveys (CSUR).
      Google ScholarLocate open access versionFindings
    • L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
      Google ScholarFindings
    • E. Pavlick, P. Rastogi, J. Ganitkevitch, and C. Van Durme, B.and Callison-Burch. 2015. Ppdb 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proc. Association for Computational Linguistics (ACL 2015).
      Google ScholarLocate open access versionFindings
    • J. Reisinger and R. Mooney. 2010. Multiprototype vector-space models of word meaning. In Proc. North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT 2010).
      Google ScholarLocate open access versionFindings
    • B. Roth, T. Barth, M. Wiegand, M. Singh, and D. Klakow. 2013. Effective slot filling based on shallow distant supervision methods. In Proc. Text Analysis Conference (TAC 2013).
      Google ScholarLocate open access versionFindings
    • S. Soderland, J. Gilmer, R. Bart, O. Etzioni, and D. Weld. 2013. Open ie to kbp relations in 3 hours. In Proc. Text Analysis Conference (TAC 2013).
      Google ScholarFindings
    • A. Sun, R. Grishman, B. Min, and W. Xu. 2011. Nyu 2011 system for kbp slot filling. In Proc. Text Analysis Conference (TAC 2011).
      Google ScholarLocate open access versionFindings
    • M. Surdeanu and H. Ji. 2014. Overview of the english slot filling track at the tac2014 knowledge base population evaluation. In Proc. Text Analysis Conference (TAC 2014).
      Google ScholarLocate open access versionFindings
    • M. Surdeanu, D. McClosky, J. Tibshirani, J. Bauer, A. Chang, V. Spitkovsky, and C. Manning. 2010. A simple distant supervision approach for the tac-kbp slot filling task. In Proc. Text Analysis Conference (TAC 2010).
      Google ScholarLocate open access versionFindings
    • M. Wang, W. Che, and C. Manning. 2013. Joint word alignment and bilingual named entity recognition using dual decomposition. In Proc. Association for Computational Linguistics (ACL 2013).
      Google ScholarLocate open access versionFindings
    • S. White and P. Smyth. 2003. Algorithms for estimating relative importance in networks. In Proc. Knowledge discovery and data mining (KDD 2003).
      Google ScholarFindings
    • D. Yu, H. Li, T. Cassidy, Q. Li, H. Huang, Z. Chen, H. Ji, Y. Zhang, and D. Roth. 2013. Rpi-blender tac-kbp2013 knowledge base population system. In Proc. Text Analysis Conference (TAC 2013).
      Google ScholarLocate open access versionFindings
    • D. Yu, H. Ji, S. Li, and C. Lin. 2015. Why read if you can scan: Scoping strategy for biographical fact extraction. In Proc. North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT 2015).
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments