Human Consensus-Oriented Image Captioning

IJCAI 2020(2020)

引用 23|浏览415
暂无评分
摘要
Image captioning aims to describe an image with a concise, accurate, and interesting sentence. To build such an automatic neural captioner, the traditional models align the generated words with a number of human-annotated sentences to mimic human-like captions. However, the crowd-sourced annotations inevitably come with data quality issues such as grammatical errors, wrong identification of visual objects and sub-optimal sentence focuses. During the model training, existing methods treat all the annotations equally regardless of the data quality. In this work, we explicitly engage human consensus to measure the quality of ground truth captions in advance, and directly encourage the model to learn high quality captions with high priority. Therefore, the proposed consensus-oriented method can accelerate the training process and achieve superior performance with only supervised objective without timeconsuming reinforcement learning. The novel consensus loss can be implemented into most of the existing state-of-the-art methods, boosting the BLEU-4 performance by maximum relative 12.47% comparing to the conventional cross-entropy loss. Extensive experiments are conducted on MS-COCO Image Captioning dataset demonstrating the proposed human consensus-oriented training method can significantly improve the training efficiency and model effectiveness.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要