PicSOM and EURECOM Experiments in TRECVID 2019 Workshop notebook paper

Héctor Laria Mantecón, Jorma Laaksonen, Danny Francis,Benoit Huet

semanticscholar(2020)

引用 0|浏览0
暂无评分
摘要
This year, the PicSOM and EURECOM teams participated only in the Video to Text Description (VTT), Description Generation subtask. Both groups submitted one or two runs labeled as a ”MeMAD” submission, stemming from a joint EU H2020 research project with that name. In total, the PicSOM team submitted four runs and EURECOM one run. The goal of the PicSOM submissions was to study the effect of using either image or video features or both. The goal of the EURECOM submission was to experiment with the use of Curriculum Learning in video captioning. The submitted five runs are as follows: • PICSOM.1-MEMAD.PRIMARY: uses ResNet and I3D features for initialising the LSTM generator, and is trained on MS COCO + TGIF using self-critical loss, • PICSOM.2-MEMAD: uses I3D features as initialisation, and is trained on TGIF using self-critical loss, • PICSOM.3: uses ResNet features as initialisation, and is trained on MS COCO + TGIF using self-critical loss, • PICSOM.4: is the same as PICSOM.1-MEMAD.PRIMARY except that the loss function used is cross-entropy, • EURECOM.MEMAD.PRIMARY: uses I3D features to initialize a GRU generator, and is trained on TGIF + MSR-VTT + MSVD with cross-entropy and curriculum learning. The runs aim at comparing the use of cross-entropy and self-critical training loss functions and to showing whether one can successfully use both still image and video features even when the COCO dataset does not allow the extractions of I3D video features. Based on the results of the runs, it seems that using both video and still image features, one can obtain better captioning results than with either one of the single modalities alone. The Curriculum Learning process proposed does not seem to be beneficial.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要