FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics

Sipan Li, L Zhang, Chenyu Dong, Haiwei Xue,Zhiyong Wu,Lifa Sun,Kun Li,Helen Meng

Communications in computer and information science(2023)

引用 0|浏览3
暂无评分
摘要
Foley sound in movies and TV episodes is of great importance to bring a more realistic feeling to the audience. Traditionally, foley artists need to create the foley sound synchronous with the content occurring in the video using their expertise. However, it is quite laborious and time consuming. In this paper, we present FastFoley, a Transformer based non-autoregressive deep-learning method that can be used to synthesize a foley audio track from the silent video clip. Existing cross-model generation methods are still based on autoregressive models such as long short-term memory (LSTM) recurrent neural network. Our FastFoley offers a new non-autoregressive framework on the audio-visual task. Upon videos provided, FastFoley can synthesize associated audio files, which outperforms the LSTM based methods in time synchronization, sound quality, and sense of reality. Particularly, we have also created a dataset called Audio-Visual Foley Dataset(AVFD) for related foley work and make it open-source, which can be downloaded at https://github.com/thuhcsi/icassp2022-FastFoley .
更多
查看译文
关键词
sound,non-autoregressive
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要