AudioViewer: Learning to Visualize Sounds

Chunjin Song,Yuchi Zhang, Willis Peng, Parmis Mohaghegh,Bastian Wandt,Helge Rhodin

WACV(2023)

引用 0|浏览10
暂无评分
摘要
A long-standing goal in the field of sensory substitution is enabling sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speech. Since such a substitution is artificial, without labels for supervised learning, our core contribution is to build a mapping from audio to video that learns from unpaired examples via high-level constraints. For speech, we additionally disentangle content from style, such as gender and dialect. Qualitative and quantitative results, including a human study, demonstrate that our unpaired translation approach maintains important audio features in the generated video and that videos of faces and numbers are well suited for visualizing high-dimensional audio features that can be parsed by humans to match and distinguish between sounds and words. Project website: https://chunjinsong.github.io/audioviewer
更多
查看译文
关键词
visualize sounds,audioviewer,learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要