Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览1
暂无评分
摘要
In this paper, we improve the efficiency of the progressive multi-stage neural audio codec (PR-Codec) by utilizing perceptually motivated training criteria. Although our baseline PR-Codec successfully reconstructs full-band signals by progressively decoding the pre-defined subband signals, transparent quality can only be guaranteed in high bit-rates. To reduce bit-rates while maintaining perceptually transparent quality, we adopt a psychoacoustic model (PAM)-based loss and propose a perceptual weighting discriminator (PWD), which enables us to synthesize and discriminate audio signals in the perceptually motivated domain. We also introduce a scalar quantization with an entropy model to further enhance the quantization efficiency. Our experimental results show that our proposed model significantly improves perceptual reconstruction quality at the expense of the waveform disparity in the time-domain, compared to our previous model.
更多
查看译文
关键词
Audio coding,deep neural network,generative adversarial network,psychoacoustics,perceptual loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要