Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning
arxiv(2024)
摘要
The chain-of-thought technique has been received well in multi-modal tasks.
It is a step-by-step linear reasoning process that adjusts the length of the
chain to improve the performance of generated prompts. However, human thought
processes are predominantly non-linear, as they encompass multiple aspects
simultaneously and employ dynamic adjustment and updating mechanisms.
Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for
soft-prompt tuning in multi-modal representation learning. The proposed AGoT
models the human thought process not only as a chain but also models each step
as a reasoning aggregation graph to cope with the overlooked multiple aspects
of thinking in single-step reasoning. This turns the entire reasoning process
into prompt aggregation and prompt flow operations. Experiments show that our
multi-modal model enhanced with AGoT soft-prompting achieves good results in
several tasks such as text-image retrieval, visual question answering, and
image recognition. In addition, we demonstrate that it has good domain
generalization performance due to better reasoning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要