Image Captioning using Adversarial Networks and Reinforcement Learning

2018 24th International Conference on Pattern Recognition (ICPR)(2018)

引用 16|浏览21
暂无评分
摘要
Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: the exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.
更多
查看译文
关键词
Monte Carlo roll-out sampling method,maximum likelihood estimation,computer vision,artificial intelligence,sequence to sequence model,reinforcement learning technique,generative adversarial networks,exposure bias problem,deep learning,natural language processing,image captioning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要