MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing
CoRR(2024)
摘要
Multimodal knowledge editing represents a critical advancement in enhancing
the capabilities of Multimodal Large Language Models (MLLMs). Despite its
potential, current benchmarks predominantly focus on coarse-grained knowledge,
leaving the intricacies of fine-grained (FG) multimodal entity knowledge
largely unexplored. This gap presents a notable challenge, as FG entity
recognition is pivotal for the practical deployment and effectiveness of MLLMs
in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a
comprehensive benchmark and dataset specifically designed for the FG multimodal
entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess
different perspectives, including Vanilla Name Answering, Entity-Level Caption,
and Complex-Scenario Recognition. In addition, a new form of knowledge editing,
Multi-step Editing, is introduced to evaluate the editing efficiency. Through
our extensive evaluations, we demonstrate that the current state-of-the-art
methods face significant challenges in tackling our proposed benchmark,
underscoring the complexity of FG knowledge editing in MLLMs. Our findings
spotlight the urgent need for novel approaches in this domain, setting a clear
agenda for future research and development efforts within the community.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要