Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
International Conference on Computational Linguistics(2024)
摘要
Event coreference resolution (ECR) is the task of determining whether
distinct mentions of events within a multi-document corpus are actually linked
to the same underlying occurrence. Images of the events can help facilitate
resolution when language is ambiguous. Here, we propose a multimodal
cross-document event coreference resolution method that integrates visual and
textual cues with a simple linear map between vision and language models. As
existing ECR benchmark datasets rarely provide images for all event mentions,
we augment the popular ECB+ dataset with event-centric images scraped from the
internet and generated using image diffusion models. We establish three methods
that incorporate images and text for coreference: 1) a standard fused model
with finetuning, 2) a novel linear mapping method without finetuning and 3) an
ensembling approach based on splitting mention pairs by semantic and
discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and
AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish
an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing
assumptions used, and establish a novel baseline on AIDA Phase 1. Our results
demonstrate the utility of multimodal information in ECR for certain
challenging coreference problems, and highlight a need for more multimodal
resources in the coreference resolution space.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要