Facebook acoustic events dataset

Haoqi Fan, Jiatong Zhou,Christian Fuegen

ICASSP（2018）

引用 1|浏览101

暂无评分

摘要

The introduction of large scale datasets such as ImageNet, AudioSet, YouTube-8M and Kinetics has greatly advanced the state-of-the-art in machine perception. These datasets primarily focus on single modalities of audio or visual cues. We seek to broaden the scope in machine perception to multimodal understanding with this work, which introduces the Facebook Acoustic Events dataset. This is a human labeled dataset which contains acoustic event labels of 500K segments from a random sample of public Facebook videos. Combined with its visual counterpart, labeled with scenes, objects and actions, we hope to make research in multi-modal learning and video understanding more accessible and convenient. We provide a well balanced dataset for acoustic event classification together with comprehensive benchmarks on both single and multimodal experiments on acoustic event detection using novel CNN based architectures.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要