Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization

IEEE Intelligent Systems(2024)

引用 0|浏览6
Few-Shot Learning (FSL) requires to fine-tune a pre-trained model on a limited set of examples from novel classes. When applied to vision-and-language models, the dominant approach for FSL has been that of learning input prompts which can be concatenated to the input context of the model. Despite the considerable promise they hold, the effectiveness and expressive power of prompts are limited by the fact that they can only lie at the input of the architecture. In this paper, we critically question the usage of learnable prompts, and instead leverage the concept of ”implicit memory“ to directly capture low- and high-level relationships within the attention mechanism at any layer of the architecture, thereby establishing an alternative to prompts in FSL. Our proposed approach, termed MemOp, exhibits superior performance across 11 widely recognized image classification datasets and a benchmark for contextual domain shift evaluation, effectively addressing the challenges associated with learnable prompts.
AI 理解论文
Chat Paper