Unsupervised discovery of extraction patterns for information extraction
Unsupervised discovery of extraction patterns for information extraction(2004)
摘要
The task of Information Extraction (IE) is to find specific types of information in natural language text. In particular, event extraction identifies instances of a particular type of event or fact (a particular “scenario”), including the entities involved, and fills a database which has been pre-defined for the scenario. As the number of documents available on-line has multiplied, entity extraction has grown in importance for various applications, including tracking terrorist activities from newswire sources and building a database of job postings from the Web, to name a few. Linguistic contexts, such as predicate-argument relationships, have been widely used as extraction patterns to identify the items to be extracted from the text. The cost of creating extraction patterns for each scenario has been a bottleneck limiting the portability of information extraction systems to different scenarios, although there has been some research on semi-supervised pattern discovery procedures to reduce this cost. The challenge is to develop a fully automatic method for identifying extraction patterns for a scenario specified by the user. This dissertation presents a novel approach for the unsupervised discovery of extraction patterns for event extraction from raw text. First, we present a framework that allows the user to have a self-customizing information ex traction system for his/her query: the Query-Driven Information Extraction (QDIE) framework. The input to the QDIE framework is the user's query: either a set of keywords or a narrative description of the event extraction task. Second, we assess the improvement in extraction pattern models. By considering the shortcomings of the prior work based on predicate-argument models and their extensions, we propose a novel extraction pattern model that is based on arbitrary subtrees of dependency trees. Third, we address the issue of portability across languages. As a case study of the QDIE framework, we implemented a pre-CODIE system, a Cross-Lingual On-Demand Information Extraction system requiring minimal human intervention, which incorporates the QDIE framework as a component for pattern discovery. In addition, we assess the role of machine translation in cross-lingual information extraction by comparing translation-based implementations.
更多查看译文
关键词
unsupervised discovery,information extraction system,entity extraction,cross-lingual information extraction,different scenario,novel extraction pattern model,event extraction,extraction pattern model,event extraction task,qdie framework,extraction pattern,linguistics,information retrieval,natural language,extraction,data bases
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要