ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification

MM '23: Proceedings of the 31st ACM International Conference on Multimedia(2023)

引用 1|浏览18
Recently, falsified claims incorporating both text and images have been disseminated more effectively than those containing text alone, raising significant concerns for multi-modal fact verification. Existing research makes contributions to multi-modal feature extraction and interaction, but fails to fully utilize and enhance the valuable and intricate semantic relationships between distinct features. Moreover, most detectors merely provide a single outcome judgment and lack an inference process or explanation. Taking these factors into account, we propose a novel Explainable and Context-Enhanced Network (ECENet) for multi-modal fact verification, making the first attempt to integrate multi-clue feature extraction, multi-level feature reasoning, and justification (explanation) generation within a unified framework. Specifically, we propose an Improved Coarse- and Fine-grained Attention Network, equipped with two types of level-grained attention mechanisms, to facilitate a comprehensive understanding of contextual information. Furthermore, we propose a novel justification generation module via deep reinforcement learning that does not require additional labels. In this module, a sentence extractor agent measures the importance between the query claim and all document sentences at each time step, selecting a suitable amount of high-scoring sentences to be rewritten as the explanation of the model. Extensive experiments demonstrate the effectiveness of the proposed method.
AI 理解论文
Chat Paper