Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
CoRR(2024)
摘要
Large Multimodal Models (LMMs) often suffer from multimodal hallucinations,
wherein they may create content that is not present in the visual inputs. In
this paper, we explore a new angle of this issue: overly detailed training data
hinders the model's ability to timely terminate generation, leading to
continued outputs beyond visual perception limits. By investigating how the
model decides to terminate generation with EOS, the special end-of-sentence
token, we find that the model assesses the completeness of the entire sequence
by comparing the generated text with the image. This observation suggests that
the model possesses an inherent potential of making proper EOS decisions based
on its visual perception to avoid overly lengthy outputs. To take advantage of
such potential, we explore two methods to mitigate multimodal hallucinations: a
training objective that enables the model to reduce hallucinations by learning
from regular instruction data, and a data filtering strategy to prevent harmful
training data from exacerbating model hallucinations. Both methods
significantly improve the hallucination performance of LMMs, without requiring
any additional data or knowledge.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要