CSTNET: Enhancing Global-To-Local Interactions for Image Captioning.

ICIP(2022)

引用 0|浏览5
暂无评分
摘要
Image captioning aims to generate descriptions of images, which requires capturing complex interactions between local regions and global context within image. However, effective global context modeling from image remains a challenging research topic. Existing approaches incorporate global-level information into the initialized input mainly based on transformer architecture. Unlike previous methods that may not be able to capture rich global contextual information, we propose a novel method named Context-Sensitive Transformer (CSTNet), which can discover the inherent global context and further empower the global-to-local interactions. Experimental results on the MSCOCO dataset show that the proposed model can significantly improve the performance of image captioning.
更多
查看译文
关键词
Image captioning,Gate mechanism,Vision transformer,Deep Neural Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要