OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览19
Entity linking in dialogue is the task of mapping entity mentions in utterances to a target knowledge base. Prior work on entity linking has mainly focused on well-written articles such as Wikipedia, annotated newswire, or domain-specific datasets. We extend the study of entity linking to open domain dialogue by presenting the OPENEL corpus: an annotated multi-domain corpus for linking entities in natural conversation to Wikidata. Each dialogic utterance, in 179 dialogues over 12 topics from the original EDINA corpus, has been annotated for entities realized by definite referring expressions as well as anaphoric forms such as he, she, it and they. OPENEL thus supports training and evaluation of entity linking in open-domain dialogue, as well as analysis of the effect of using dialogue context and anaphora resolution in model training. It can also be used for fine-tuning a coreference resolution algorithm. To the best of our knowledge, this is the first substantial entity linking corpus publicly available for open-domain dialogue. We also establish baselines for named entity linking in open domain conversation using several existing entity linking systems. We find that the Transformer-based system, Flair + BLINK, has the best performance with a 0.65 F1 score. Our results show that dialogue context is extremely beneficial for entity linking in conversations, with Flair + BLINK achieving an F1 of 0.61 without discourse context. These results also demonstrate the remaining performance gap between the baselines and human performance, highlighting the challenges of entity linking in open-domain dialogue, and suggesting many avenues for future research using OPENEL.
Entity Linking, Coreference, Discourse Modeling, Wikidata, Open-domain Dialogue
AI 理解论文
Chat Paper