Utilising Technical Term Extraction in Coreference Resolution on General Academic Domains


引用 0|浏览0
Coreference resolution, the task of identifying mentions that refer to the same entity, is an important task for natural language understanding such as question answering, summarising, and information extraction (IE). It is a task that looks simple for human beings who intuitively and repeatedly resolve coreferring mentions every time we encounter anaphoric expressions or rephrases, yet it is not trivial for automated systems due to the ambiguous nature of natural languages. Coreference resolution has a long history with many proposed techniques [2, 9, 10] and datasets [4, 6, 13, 14]. Most of them focus on texts in general domain, such as news articles, allowing automated systems to aggregate information from digital documents. However, the approaches in academic domains are very limited. To the best of our knowledge, the only field that has been widely focused is biomedicine. There is also a coreference resolution corpus in computational linguistics with documents from ACL Anthology ??. However, there has been only a few or no attempts in tackling the problem in general academic domains. After preliminary investigation on scientific abstracts from several domains, we noticed that technical terms, lexical units that are used in a more or less specialised way in a domain [7], are promising candidates for coreference mentions in academic domains, since technical terms can be considered as main participants in academic writings. We thus focus on improving coreference resolution on general academic domains, utilising extracted technical terms. We have created datasets for technical term extraction and coreference resolution based on a corpus of multiple academic domains. The datasets are used in training and testing our term extraction system, and evaluating our proposed methods of integrating term extraction result into an existing coreference resolution system, namely Stanford’s Dcoref coreference resolver [9, 10]. In section 2, we provide information about previous works in technical term extraction and coreference resolution that are related to ours. We describe the characteristics of out dataset in section 3. The details of our technical term extraction unit are presented in section 4. Section 5 shows our methods to integrate the result of term extraction into mention detection module. We conclude our work in section 6
AI 理解论文
Chat Paper