AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
Our work describes OLLIE, a novel Open Information Extraction extractor that makes two significant advances over the existing Open IE systems

Open language learning for information extraction

EMNLP-CoNLL, pp.523-534, (2012)

Cited by: 796|Views349
EI
Full Text
Bibtex
Weibo

Abstract

Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, state-of-the-art Open IE systems such as ReVerb and woe share two important weaknesses -- (1) they extract only relations that...More

Code:

Data:

0
Introduction
  • While traditional Information Extraction (IE) (ARPA, 1991; ARPA, 1998) focused on identifying and extracting specific relations of interest, there has been great interest in scaling IE to a broader set of relations and to far larger corpora (Banko et al, 2007; Hoffmann et al, 2010; Mintz et al, 2009; Carlson et al, 2010; Fader et al, 2011).
  • The state-of-the-art Open IE systems, REVERB (Fader et al, 2011; Etzioni et al, 2011) and WOEparse (Wu and Weld, 2010) suffer from two key drawbacks
  • They handle a limited subset of sentence constructions for expressing relationships.
  • Figure 2 illustrates OLLIE’s architecture for learning and applying binary extraction patterns
  • It uses a set of high precision seed tuples from REVERB to bootstrap a large training set.
  • OLLIE analyzes the context around the tuple (Section 4) to add information and a confidence function
Highlights
  • While traditional Information Extraction (IE) (ARPA, 1991; ARPA, 1998) focused on identifying and extracting specific relations of interest, there has been great interest in scaling IE to a broader set of relations and to far larger corpora (Banko et al, 2007; Hoffmann et al, 2010; Mintz et al, 2009; Carlson et al, 2010; Fader et al, 2011)
  • In this paper we present OLLIE (Open Language Learning for Information Extraction), 1 our novel Open IE system that overcomes the limitations of previous Open IE by (1) expanding the syntactic scope of relation phrases to cover a much larger number of relation expressions, and (2) expanding the Open IE representation to allow additional context information such as attribution and clausal modifiers
  • Where Semantic Role Labeling (SRL) begins with a verb or noun and looks for arguments that play roles with respect to that verb or noun, Open IE looks for a phrase that expresses a relation between a pair of arguments
  • Our work describes OLLIE, a novel Open IE extractor that makes two significant advances over the existing Open IE systems
  • It expands the syntactic scope of Open IE systems by identifying relationships mediated by nouns and adjectives
  • OLLIE obtains 1.9 to 2.7 times more area under precisionyield curves compared to existing state-of-the-art open extractors
Methods
  • Since Open IE is designed to handle a variety of domains, the authors create a dataset of 300 random sentences from three sources: News, Wikipedia and Biology textbook.
  • The News and Wikipedia test sets are a random subset of Wu and Weld’s test set for WOEparse.
  • OLLIE, REVERB and WOEparse on this dataset resulting in a total of 1,945 extractions from all three systems.
  • All systems associate a confidence value with an extraction – ranking with these confidence values generates a precision-yield curve for this dataset.
Results
  • The authors examine a sample of the extractions to verify that noun-mediated extractions are the main reason for this large yield boost over REVERB (73% of OLLIE extractions were noun-mediated).
Conclusion
  • The authors' work describes OLLIE, a novel Open IE extractor that makes two significant advances over the existing Open IE systems.
  • It expands the syntactic scope of Open IE systems by identifying relationships mediated by nouns and adjectives.
  • By analyzing the context around an extraction, OLLIE is able to identify cases where the relation is not asserted as factual, but is hypothetical or conditionally true.
  • OLLIE is available for download at http://openie.cs.washington.edu
Related work
  • There is a long history of bootstrapping and pattern learning approaches in traditional information extraction, e.g., DIPRE (Brin, 1998), SnowBall (Agichtein and Gravano, 2000), Espresso (Pantel and Pennacchiotti, 2006), PORE (Wang et al, 2007), SOFIE (Suchanek et al, 2009), NELL (Carlson et al, 2010), and PROSPERA (Nakashole et al, 2011). All these approaches first bootstrap data based on seed instances of a relation (or seed data from existing resources such as Wikipedia) and then learn lexical or lexico-POS patterns to create an extractor. Other approaches have extended these to learning patterns based on full syntactic analysis of a sentence (Bunescu and Mooney, 2005; Suchanek et al, 2006; Zhao and Grishman, 2005).

    OLLIE has significant differences from the previous work in pattern learning. First, and most importantly, these previous systems learn an extractor for each relation of interest, whereas OLLIE is an open extractor. OLLIE’s strength is its ability to generalize from one relation to many other relations that are expressed in similar forms. This happens both via syntactic generalization and type generalization of relation words (sections 3.2.1 and 3.2.2). This capability is essential as many relations in the test set are not even seen in the training set – in early experiments we found that non-generalized pattern learning (equivalent to traditional IE) had significantly less yield at a slightly higher precision.
Funding
  • This research was supported in part by NSF grant IIS-0803481, ONR grant N00014-08-1-0431, DARPA contract FA8750-09C-0179 and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-7058
Reference
  • E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Procs. of the Fifth ACM International Conference on Digital Libraries.
    Google ScholarLocate open access versionFindings
  • ARPA. 1991. Proc. 3rd Message Understanding Conf. Morgan Kaufmann.
    Google ScholarFindings
  • ARPA. 1998. Proc. 7th Message Understanding Conf. Morgan Kaufmann.
    Google ScholarFindings
  • M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the Web. In Procs. of IJCAI.
    Google ScholarLocate open access versionFindings
  • S. Brin. 1998. Extracting Patterns and Relations from the World Wide Web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98, pages 172–183, Valencia, Spain.
    Google ScholarLocate open access versionFindings
  • Razvan C. Bunescu and Raymond J. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proc. of HLT/EMNLP.
    Google ScholarLocate open access versionFindings
  • Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for neverending language learning. In Procs. of AAAI.
    Google ScholarLocate open access versionFindings
  • Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling. In Proceedings of the 6th International Conference on Knowledge Capture (K-CAP ’11).
    Google ScholarLocate open access versionFindings
  • Paul R. Cohen. 1995. Empirical Methods for Artificial Intelligence. MIT Press.
    Google ScholarFindings
  • Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1-3):43–69.
    Google ScholarLocate open access versionFindings
  • Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In Language Resources and Evaluation (LREC 2006).
    Google ScholarLocate open access versionFindings
  • Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. 2011. Open information extraction: the second generation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ’11).
    Google ScholarLocate open access versionFindings
  • Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of EMNLP.
    Google ScholarLocate open access versionFindings
  • Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. 2010. Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 286– 295.
    Google ScholarLocate open access versionFindings
  • Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S. Zettlemoyer, and Daniel S. Weld. 2011. Knowledgebased weak supervision for information extraction of overlapping relations. In ACL, pages 541–550.
    Google ScholarLocate open access versionFindings
  • Richard Johansson and Pierre Nugues. 2008. The effect of syntactic representation on semantic role labeling. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 08), pages 393–400.
    Google ScholarLocate open access versionFindings
  • Paul Kingsbury Martha and Martha Palmer. 2002. From treebank to propbank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 02).
    Google ScholarLocate open access versionFindings
  • A. Meyers, R. Reeves, C. Macleod, R. Szekely, V. Zielinska, B. Young, and R. Grishman. 2004. Annotating Noun Argument Structure for NomBank. In Proceedings of LREC-2004, Lisbon, Portugal.
    Google ScholarLocate open access versionFindings
  • Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP ’09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pages 1003–1011.
    Google ScholarLocate open access versionFindings
  • Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In Proceedings of the Fourth International Conference on Web Search and Web Data Mining (WSDM 2011), pages 227–236.
    Google ScholarLocate open access versionFindings
  • Joakim Nivre and Jens Nilsson. 2004. Memory-based dependency parsing. In Proceedings of the Conference on Natural Language Learning (CoNLL-04), pages 49–56.
    Google ScholarLocate open access versionFindings
  • Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of 21st International Conference on Computational Linguistics and
    Google ScholarLocate open access versionFindings
  • 44th Annual Meeting of the Association for Computational Linguistics (ACL’06). P. Resnik. 1996. Selectional constraints: an informationtheoretic model and its computational realization. Cognition. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In ECML/PKDD (3), pages 148–163. Alan Ritter, Mausam, and Oren Etzioni. 2010. A latent dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Karin Kipper Schuler. 2006. VerbNet: A BroadCoverage, Comprehensive Verb Lexicon. Ph.D. thesis, University of Pennsylvania. Y. Shinyama and S. Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Procs. of HLT/NAACL. Fabian M. Suchanek, Georgiana Ifrim, and Gerhard Weikum. 2006. Combining linguistic and statistical analysis to extract relations from web documents. In Procs. of KDD, pages 712–717.
    Google ScholarFindings
  • Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. 2009. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631–640. Gang Wang, Yong Yu, and Haiping Zhu. 2007. Pore: Positive-only relation extraction from wikipedia text. In Proceedings of 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC’07), pages 580–594. Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Shubin Zhao and Ralph Grishman. 2005. Extracting relations with integrated information using kernel methods. In Procs. of ACL. Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. 2009. StatSnowball: a statistical approach to extracting entity relationships. In WWW ’09: Proceedings of the 18th international conference on World Wide Web, pages 101–110, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科