Multimodal City-Verification On Flickr Videos Using Acoustic And Textual Features

ICASSP(2012)

引用 26|浏览19
暂无评分
摘要
We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumer-produced videos "from-the-wild". 18 cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verification task, with an EER minimum of 21.8%, suggesting that similar to 80% of test videos, when tested against a correct target city, were identified as belonging to that city. This result is well above-chance, even as the videos contained very few city-specific audio and metadata features. We have also demonstrated the complementarity of audio and metadata for this task.
更多
查看译文
关键词
City verification,acoustic models,N-gram language models,multimodal processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要