Adaptive Semantic-Enhanced Transformer for Image Captioning

Jing Zhang, Zhongjun Fang,Han Sun,Zhe Wang

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS(2024)

引用 7|浏览19
暂无评分
摘要
In the research on image captioning, rich semantic information is very important for generating critical caption words as guiding information. However, semantic information from offline object detectors involves many semantic objects that do not appear in the caption, thereby bringing noise into the decoding process. To produce more accurate semantic guiding information and further optimize the decoding process, we propose an end-to-end adaptive semantic-enhanced transformer (AS-Transformer) model for image captioning. For semantic enhancement information extraction, we propose a constrained weaklysupervised learning (CWSL) module, which reconstructs the semantic object's probability distribution detected by the multiple instances learning (MIL) through a joint loss function. These strengthened semantic objects from the reconstructed probability distribution can better depict the semantic meaning of images. Also, for semantic enhancement decoding, we propose an adaptive gated mechanism (AGM) module to adjust the attention between visual and semantic information adaptively for the more accurate generation of caption words. Through the joint control of the CWSL module and AGM module, our proposed model constructs a complete adaptive enhancement mechanism from encoding to decoding and obtains visual context that is more suitable for captions. Experiments on the public Microsoft Common Objects in COntext (MSCOCO) and Flickr30K datasets illustrate that our proposed AS-Transformer can adaptively obtain effective semantic information and adjust the attention weights between semantic and visual information automatically, which achieves more accurate captions compared with semantic enhancement methods and outperforms state-of-the-art methods.
更多
查看译文
关键词
Semantics,Visualization,Decoding,Transformers,Adaptation models,Detectors,Image reconstruction,Adaptive gated mechanism (AGM),adaptive semantic-enhanced transformer (AS-Transformer),constrained weakly supervised learning,image captioning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要