AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The first approach that models all aspects of the multi-instance multi-label setting, i.e., the latent assignment of labels to instances and dependencies between labels assigned to the same entity pair

Multi-instance multi-label learning for relation extraction

EMNLP-CoNLL, (2012)

Cited by: 722|Views197
EI
Full Text
Bibtex
Weibo

Abstract

Distant supervision for relation extraction (RE) -- gathering training data by aligning a database of facts with text -- is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For e...More

Code:

Data:

0
Introduction
  • Information extraction (IE), defined as the task of extracting structured information from free text, has received renewed interest in the “big data” era, when petabytes of natural-language text containing thousands of different structure types are readily available.
  • Traditional supervised methods are unlikely to scale in this context, as training data is either limited or nonexistent for most of these structures.
  • One of the most promising approaches to IE that addresses this limitation is distant supervision, which generates training data automatically by aligning a DB =.
  • Sentence Barack Obama is the 44th and current President of the United States.
  • United States President Barack Obama meets with Chinese Vice President Xi Jinping today.
Highlights
  • Information extraction (IE), defined as the task of extracting structured information from free text, has received renewed interest in the “big data” era, when petabytes of natural-language text containing thousands of different structure types are readily available
  • The first decision is necessary because the gold KBP answers contain supporting documents only from the corpus provided by the organizers but we retrieve candidate answers from multiple collections
  • The second is required because the focus of this work is not on sentence retrieval but on relation extraction (RE), which should be evaluated in isolation
  • In this paper we showed that distant supervision for RE, which generates training data by aligning a database of facts with text, poses a distinct multiinstance multi-label learning scenario
  • To our knowledge, the first approach that models all aspects of the multi-instance multi-label (MIML) setting, i.e., the latent assignment of labels to instances and dependencies between labels assigned to the same entity pair
  • % of mentions that do not express their relation up to 31% up to 39%
  • Our model performs well even when not all aspects of the MIML scenario are common, and as seen in the discussion, shows significant improvement when evaluated on entity pairs with many labels or mentions
Results
  • The first was developed by Riedel et al (2010) by aligning Freebase relations with the New York Times (NYT) corpus.
  • Some relations extracted during testing will be incorrectly marked as wrong, because Freebase has no information on them
  • To mitigate this issue, Riedel et al (2010) and Hoffman et al (2011) perform a second evaluation where they compute the accuracy of labels assigned to a set of relation mentions that they manually annotated.
  • The second is required because the focus of this work is not on sentence retrieval but on RE, which should be evaluated in isolation.
Conclusion
  • In the KBP dataset, MIML-RE performs consistently better than the implementation of Hoffmann’s model, with higher precision values for the same recall point, and much higher overall recall
  • The authors believe that these differences are caused by the Bayesian framework, PrecisionIn this paper the authors showed that distant supervision for RE, which generates training data by aligning a database of facts with text, poses a distinct multiinstance multi-label learning scenario.
  • When all aspects of the MIML scenario are present, the model is well-equipped to handle them
Tables
  • Table1: Statistics about the two corpora used in this paper. Some of the numbers for the Riedel dataset is from (<a class="ref-link" id="cRiedel_et+al_2010_a" href="#rRiedel_et+al_2010_a">Riedel et al, 2010</a>; <a class="ref-link" id="cHoffmann_et+al_2011_a" href="#rHoffmann_et+al_2011_a">Hoffmann et al, 2011</a>)
  • Table2: Results at the highest F1 point in the precision/recall curve on the dataset that contains groups with at least 10 mentions
Download tables as Excel
Related work
Funding
  • We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no
Reference
  • Kedar Bellare and Andrew McCallum. 2007. Learning extractors from unlabeled text using relevant databases. In Proceedings of the Sixth International Workshop on Information Extraction on the Web.
    Google ScholarLocate open access versionFindings
  • Carla Brodley and Mark Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research (JAIR).
    Google ScholarLocate open access versionFindings
  • Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Mark Craven and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology.
    Google ScholarLocate open access versionFindings
  • Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. 200Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledgebased weak supervision for information extraction of overlapping relations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Heng Ji, Ralph Grishman, Hoa T. Dang, Kira Griffitt, and Joe Ellis. 2010. Overview of the TAC 2010 knowledge base population track. In Proceedings of the Text Analytics Conference.
    Google ScholarLocate open access versionFindings
  • Heng Ji, Ralph Grishman, and Hoa T. Dang. 2011. Overview of the TAC 2011 knowledge base population track. In Proceedings of the Text Analytics Conference.
    Google ScholarLocate open access versionFindings
  • Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 200Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Truc Vien T. Nguyen and Alessandro Moschitti. 2011. End-to-end relation extraction using distant supervision from external semantic repositories. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.
    Google ScholarLocate open access versionFindings
  • Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD ’10).
    Google ScholarLocate open access versionFindings
  • Ang Sun, Ralph Grishman, Wei Xu, and Bonan Min. 2011. New York University 2011 system for KBP slot filling. In Proceedings of the Text Analytics Conference.
    Google ScholarLocate open access versionFindings
  • Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, and Christopher D. Manning. 2011a. Stanford’s distantlysupervised slot-filling system. In Proceedings of the Text Analytics Conference.
    Google ScholarLocate open access versionFindings
  • Mihai Surdeanu, David McClosky, Mason R. Smith, Andrey Gusev, and Christopher D. Manning. 2011b. Customizing an information extraction system to a new domain. In Proceedings of the Workshop on Relational Models of Semantics, Portland, Oregon, June.
    Google ScholarLocate open access versionFindings
  • Fei Wu and Dan Weld. 2007. Autonomously semantifying Wikipedia. In Proceedings of the International Conference on Information and Knowledge Management (CIKM).
    Google ScholarLocate open access versionFindings
  • Z.H. Zhou and M.L. Zhang. 2007. Multi-instance multilabel learning with application to scene classification. In Advances in Neural Information Processing Systems (NIPS).
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科