Text-to-Text Pre-Training with Paraphrasing for Improving Transformer-Based Image Captioning

2023 31st European Signal Processing Conference (EUSIPCO)(2023)

引用 0|浏览1
暂无评分
摘要
In this paper, we propose a novel training method for the transformer encoder-decoder based image captioning, which directly generates a captioning text from an input image. In general, many image- to- text paired datasets need to be prepared for robust image captioning, but such datasets cannot be collected in practical cases. Our key idea for mitigating the data preparation cost is to utilize text-to-text paraphrasing modeling, i.e., a task to convert an input text into different expressions without changing the meaning. In fact, paraphrasing deals with a similar transformation task to image captioning even though paraphrasing tasks have to handle texts instead of images. In our proposed method, an encoder-decoder network trained via the paraphrasing task is directly leveraged for image captioning. Thus, an encoder-decoder network pre-trained by a text-to-text transformation task is transferred into an image-to-text transformation task even though a different modal must be handled in the encoder network. Our experiments using the MS COCO caption datasets demonstrate the effectiveness of the proposed method.
更多
查看译文
关键词
image captioning,transformer encoder-decoder,paraphrasing,pre-training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要