Image and Video Captioning with Augmented Neural Architectures.

IEEE MultiMedia(2018)

引用 20|浏览24
暂无评分
摘要
Neural-network-based image and video captioning can be substantially improved by utilizing architectures that make use of special features from the scene context, objects, and locations. A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.
更多
查看译文
关键词
Feature extraction,Neural networks,Computational modeling,Multimedia communication,Object recognition,Detectors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要