Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS(2022)

引用 0|浏览2
暂无评分
摘要
Long Short-Term Memory (LSTM) combined with attention mechanism is extensively used to generate semantic sentences of images in image captioning models. However, features of salient regions and spatial information are not utilized sufficiently in most related works. Meanwhile, the LSTM also suffers from the problem of underutilized information in a single time step. In the paper, two innovative approaches are proposed to solve these problems. First, the Synergy-Gated Attention (SGA) method is proposed, which can process the spatial features and the salient region features of given images simultaneously. SGA establishes a gated mechanism through the global features to guide the interaction of information between these two features. Then, the Recurrent Fusion LSTM (RF-LSTM) mechanism is proposed, which can predict the next hidden vectors in one time step and improve linguistic coherence by fusing future information. Experimental results on the benchmark dataset of MSCOCO show that compared with the state-of-the-art methods, the proposed method can improve the performance of image captioning model, and achieve competitive performance on multiple evaluation indicators.
更多
查看译文
关键词
Image captioning, Synergy-Gated Attention, Recurrent Fusion LSTM, Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要