Self-Supervised Generation of Spatial Audio for 360 degrees Video

neural information processing systems(2018)

引用 143|浏览41
暂无评分
摘要
We introduce an approach to convert mono audio recorded by a 360 degrees video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 degrees video viewing, but spatial audio microphones are still rare in current 360 degrees video production. Our system consists of end-to-end trainable neural networks that separate individual sound sources and localize them on the viewing sphere, conditioned on multi-modal analysis of audio and 360 degrees video frames. We introduce several datasets, including one filmed ourselves, and one collected in-the-wild from YouTube, consisting of 360 degrees videos uploaded with spatial audio. During training, ground-truth spatial audio serves as self-supervision and a mixed down mono track forms the input to our network. Using our approach, we show that it is possible to infer the spatial location of sound sources based only on 360 degrees video and a mono audio track.
更多
查看译文
关键词
deep neural networks,individual sound,audio track,video camera,360 video,spatial audio,video production
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要