Multimodal Image Captioning Through Combining Reinforced Cross Entropy Loss And Stochastic Deprecation

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)(2019)

引用 2|浏览16
暂无评分
摘要
Recently, Cross Entropy Loss (CEL) has been proved to be useful in encoder-decoder based multimodal image captioning; however, it still faces the difficulty of inconsistency between optimizing function and evaluation metrics. In this paper, we propose a new approach for multimodal image captioning. It consists of 1) Reinforced Cross Entropy Loss (RCEL) to maximize the probability of ground truth captions and optimize evaluation metrics directly, and 2) Stochastic Deprecation (SD) to automatically select high-quality ground truth sentences without losing the diversity of corpus. The proposed RCEL and SD are generic and can improve the existing natural language generation models while combining them (RCEL-SD) can achieve the best result. Experimental results on the benchmark MSCOCO dataset show that the proposed RCEL-SD respectively outperforms CEL in terms of all the 7 evaluation metrics on three recent image captioning models.
更多
查看译文
关键词
cross entropy loss, reinforced cross entropy loss, stochastic deprecation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要