Cross-modal Embeddings for Video and Audio Retrieval

COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV(2019)

引用 73|浏览67
暂无评分
摘要
In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projecting the audio and visual features into a common feature space, to obtain joint audio-visual embeddings. These links are used to retrieve audio samples that fit well to a given silent video, and also to retrieve images that match a given query audio. The results in terms of Recall@K obtained over a subset of YouTube-8M videos show the potential of this unsupervised approach for cross-modal feature learning.
更多
查看译文
关键词
Cross-modal, Retrieval, YouTube-8M
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要