Enhanced Text-Guided Attention Model For Image Captioning

2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM)(2018)

引用 47|浏览88
暂无评分
摘要
Attention mechanism plays an important role in understanding images and demonstrates its effectiveness in generating natural language descriptions of images. In recent years, with the advance of deep neural networks, visual attention has been well exploited in the encoder-decoder neural network-based framework. On the one hand, existing study shows that guidance captions can help attend to relevant image regions and suppress unimportant ones during the image encoding stage, especially for cluttered images. On the other hand, visual attention has been well exploited during the decoding stage. Observing the naturally complementary property between them, we propose a two-side attention model which combines the attention mechanism seamlessly and associated with a coarse to fine attention mechanism. The original text-guided attention model operates on region-level image feature, which is lack of definite semantic information and causes unsatisfied attention visualization. We alleviate the problem by enabling attention to be calculated at object-level image feature, which helps to obtain performance improvement and more interpretable attention visualization. Experiments conducted on MSCOCO datasets demonstrate the consistent improvement on text-guided attention model for image captioning.
更多
查看译文
关键词
Image Captioning, Attention Mechanism, Text-Guided
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要