Audio Texture Synthesis With Random Neural Networks: Improving Diversity And Quality

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 15|浏览57
暂无评分
摘要
Texture synthesis techniques based on matching the Gram matrix of feature activations in neural networks have achieved spectacular success in the image domain. In this paper we extend these techniques to the audio domain. We demonstrate that synthesizing diverse audio textures is challenging, and argue that this is because audio data is relatively low-dimensional. We therefore introduce two new terms to the original Grammian loss: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures. We quantitatively study the impact of our design choices on the quality of the synthesized audio by introducing an audio analogue to the Inception loss which we term the VGGish loss. We show that there is a trade-off between the diversity and quality of the synthesized audio using this technique. Finally we perform a number of experiments to qualitatively study how these design choices impact the quality of the synthesized audio.
更多
查看译文
关键词
synthesized audio,audio texture synthesis,random neural networks,texture synthesis techniques,Gram matrix,feature activations,autocorrelation term,diversity term,Grammian loss,rhythm preservation,audio quality,inception loss,VGGish loss,diversity-quality trade-off
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要