AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Our work describes OLLIE, a novel Open Information Extraction extractor that makes two significant advances over the existing Open IE systems
Open language learning for information extraction
EMNLP-CoNLL, pp.523-534, (2012)
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, state-of-the-art Open IE systems such as ReVerb and woe share two important weaknesses -- (1) they extract only relations that...More
PPT (Upload PPT)
- While traditional Information Extraction (IE) (ARPA, 1991; ARPA, 1998) focused on identifying and extracting specific relations of interest, there has been great interest in scaling IE to a broader set of relations and to far larger corpora (Banko et al, 2007; Hoffmann et al, 2010; Mintz et al, 2009; Carlson et al, 2010; Fader et al, 2011).
- The state-of-the-art Open IE systems, REVERB (Fader et al, 2011; Etzioni et al, 2011) and WOEparse (Wu and Weld, 2010) suffer from two key drawbacks
- They handle a limited subset of sentence constructions for expressing relationships.
- Figure 2 illustrates OLLIE’s architecture for learning and applying binary extraction patterns
- It uses a set of high precision seed tuples from REVERB to bootstrap a large training set.
- OLLIE analyzes the context around the tuple (Section 4) to add information and a confidence function
- While traditional Information Extraction (IE) (ARPA, 1991; ARPA, 1998) focused on identifying and extracting specific relations of interest, there has been great interest in scaling IE to a broader set of relations and to far larger corpora (Banko et al, 2007; Hoffmann et al, 2010; Mintz et al, 2009; Carlson et al, 2010; Fader et al, 2011)
- In this paper we present OLLIE (Open Language Learning for Information Extraction), 1 our novel Open IE system that overcomes the limitations of previous Open IE by (1) expanding the syntactic scope of relation phrases to cover a much larger number of relation expressions, and (2) expanding the Open IE representation to allow additional context information such as attribution and clausal modifiers
- Where Semantic Role Labeling (SRL) begins with a verb or noun and looks for arguments that play roles with respect to that verb or noun, Open IE looks for a phrase that expresses a relation between a pair of arguments
- Our work describes OLLIE, a novel Open IE extractor that makes two significant advances over the existing Open IE systems
- It expands the syntactic scope of Open IE systems by identifying relationships mediated by nouns and adjectives
- OLLIE obtains 1.9 to 2.7 times more area under precisionyield curves compared to existing state-of-the-art open extractors
- Since Open IE is designed to handle a variety of domains, the authors create a dataset of 300 random sentences from three sources: News, Wikipedia and Biology textbook.
- The News and Wikipedia test sets are a random subset of Wu and Weld’s test set for WOEparse.
- OLLIE, REVERB and WOEparse on this dataset resulting in a total of 1,945 extractions from all three systems.
- All systems associate a confidence value with an extraction – ranking with these confidence values generates a precision-yield curve for this dataset.
- The authors examine a sample of the extractions to verify that noun-mediated extractions are the main reason for this large yield boost over REVERB (73% of OLLIE extractions were noun-mediated).
- The authors' work describes OLLIE, a novel Open IE extractor that makes two significant advances over the existing Open IE systems.
- It expands the syntactic scope of Open IE systems by identifying relationships mediated by nouns and adjectives.
- By analyzing the context around an extraction, OLLIE is able to identify cases where the relation is not asserted as factual, but is hypothetical or conditionally true.
- OLLIE is available for download at http://openie.cs.washington.edu
- There is a long history of bootstrapping and pattern learning approaches in traditional information extraction, e.g., DIPRE (Brin, 1998), SnowBall (Agichtein and Gravano, 2000), Espresso (Pantel and Pennacchiotti, 2006), PORE (Wang et al, 2007), SOFIE (Suchanek et al, 2009), NELL (Carlson et al, 2010), and PROSPERA (Nakashole et al, 2011). All these approaches first bootstrap data based on seed instances of a relation (or seed data from existing resources such as Wikipedia) and then learn lexical or lexico-POS patterns to create an extractor. Other approaches have extended these to learning patterns based on full syntactic analysis of a sentence (Bunescu and Mooney, 2005; Suchanek et al, 2006; Zhao and Grishman, 2005).
OLLIE has significant differences from the previous work in pattern learning. First, and most importantly, these previous systems learn an extractor for each relation of interest, whereas OLLIE is an open extractor. OLLIE’s strength is its ability to generalize from one relation to many other relations that are expressed in similar forms. This happens both via syntactic generalization and type generalization of relation words (sections 3.2.1 and 3.2.2). This capability is essential as many relations in the test set are not even seen in the training set – in early experiments we found that non-generalized pattern learning (equivalent to traditional IE) had significantly less yield at a slightly higher precision.
- This research was supported in part by NSF grant IIS-0803481, ONR grant N00014-08-1-0431, DARPA contract FA8750-09C-0179 and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-7058
- E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Procs. of the Fifth ACM International Conference on Digital Libraries.
- ARPA. 1991. Proc. 3rd Message Understanding Conf. Morgan Kaufmann.
- ARPA. 1998. Proc. 7th Message Understanding Conf. Morgan Kaufmann.
- M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the Web. In Procs. of IJCAI.
- S. Brin. 1998. Extracting Patterns and Relations from the World Wide Web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98, pages 172–183, Valencia, Spain.
- Razvan C. Bunescu and Raymond J. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proc. of HLT/EMNLP.
- Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. Toward an architecture for neverending language learning. In Procs. of AAAI.
- Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling. In Proceedings of the 6th International Conference on Knowledge Capture (K-CAP ’11).
- Paul R. Cohen. 1995. Empirical Methods for Artificial Intelligence. MIT Press.
- Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1-3):43–69.
- Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In Language Resources and Evaluation (LREC 2006).
- Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. 2011. Open information extraction: the second generation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ’11).
- Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of EMNLP.
- Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. 2010. Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 286– 295.
- Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke S. Zettlemoyer, and Daniel S. Weld. 2011. Knowledgebased weak supervision for information extraction of overlapping relations. In ACL, pages 541–550.
- Richard Johansson and Pierre Nugues. 2008. The effect of syntactic representation on semantic role labeling. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 08), pages 393–400.
- Paul Kingsbury Martha and Martha Palmer. 2002. From treebank to propbank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 02).
- A. Meyers, R. Reeves, C. Macleod, R. Szekely, V. Zielinska, B. Young, and R. Grishman. 2004. Annotating Noun Argument Structure for NomBank. In Proceedings of LREC-2004, Lisbon, Portugal.
- Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP ’09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pages 1003–1011.
- Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In Proceedings of the Fourth International Conference on Web Search and Web Data Mining (WSDM 2011), pages 227–236.
- Joakim Nivre and Jens Nilsson. 2004. Memory-based dependency parsing. In Proceedings of the Conference on Natural Language Learning (CoNLL-04), pages 49–56.
- Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of 21st International Conference on Computational Linguistics and
- 44th Annual Meeting of the Association for Computational Linguistics (ACL’06). P. Resnik. 1996. Selectional constraints: an informationtheoretic model and its computational realization. Cognition. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In ECML/PKDD (3), pages 148–163. Alan Ritter, Mausam, and Oren Etzioni. 2010. A latent dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Karin Kipper Schuler. 2006. VerbNet: A BroadCoverage, Comprehensive Verb Lexicon. Ph.D. thesis, University of Pennsylvania. Y. Shinyama and S. Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Procs. of HLT/NAACL. Fabian M. Suchanek, Georgiana Ifrim, and Gerhard Weikum. 2006. Combining linguistic and statistical analysis to extract relations from web documents. In Procs. of KDD, pages 712–717.
- Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. 2009. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631–640. Gang Wang, Yong Yu, and Haiping Zhu. 2007. Pore: Positive-only relation extraction from wikipedia text. In Proceedings of 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC’07), pages 580–594. Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Shubin Zhao and Ralph Grishman. 2005. Extracting relations with integrated information using kernel methods. In Procs. of ACL. Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. 2009. StatSnowball: a statistical approach to extracting entity relationships. In WWW ’09: Proceedings of the 18th international conference on World Wide Web, pages 101–110, New York, NY, USA. ACM.