Paic: Parallelised Attentive Image Captioning

DATABASES THEORY AND APPLICATIONS, ADC 2020(2020)

引用 3|浏览69
暂无评分
摘要
Most encoder-decoder architectures generate the image description sentence based on the recurrent neural networks (RNN). However, the RNN decoder trained by Back Propagation Through Time (BPTT) is inherently time-consuming, accompanied by the gradient vanishing problem. To overcome these difficulties, we propose a novel Parallelised Attentive Image Captioning Model (PAIC) that purely employs the optimised attention mechanism to decode natural sentences without using RNNs. At each decoding phase, our model can precisely localise different areas of image utilising the well-defined spatial attention module, meanwhile capturing the word sequence powered by the well-attested multi-head self-attention model. In contrast to the RNNs, the proposed PAIC can efficiently exploit the parallel computation advantages of GPU hardware for training, and further facilitate the gradient propagation. Extensive experiments on MS-COCO demonstrate that the proposed PAIC significantly reduces the training time, while achieving competitive performance compared to conventional RNN-based models.
更多
查看译文
关键词
Image captioning, Self-attention, Parallel training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要