A cooperative approach based on self-attention with interactive attribute for image caption

MULTIMEDIA TOOLS AND APPLICATIONS(2022)

引用 0|浏览14
暂无评分
摘要
Image caption is a challenging issue in the area of image understanding, in which most of the models are trained by the framework combined a deep convolutional neural network with a recurrent neural network. However, the features extracted by the convolutional neural network could capture the information of salient regions, which fails to cover the details in the image. Moreover, the gradient vanishing problem of the recurrent neural networks would cause the loss of the previous information as the time step grows. In this paper, Cooperative Self-Attention (CSA) is proposed address these problems. Comparing with existing methods, our model enhances the representation of the image by fusing the additional attribute information from the object detection. A sub-module named Inter-Attribute indicating the interaction of objects is proposed to strengthen the context of the entities. In virtue of the advantages of Self-Attention, different from previous methods that predict the next word based on one prior word and hidden state, our model concatenates all of the words generated step by step to solve long-term dependencies. Comparing with published state-of-the-art methods, our CSA demonstrates outstanding performance.
更多
查看译文
关键词
Image caption,Deep neural network,Self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要