Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement: Solving Size and Performance Dilemma by Reversible and Invertible Recurrent Network for Speech Enhancement.

AIPR(2022)

引用 0|浏览0
暂无评分
摘要
Reducing parameter numbers and improving system performance is considered a dilemma problem. As is known to all, reducing parameter numbers will lead to performance degradation, while improving performance often lead to parameter numbers increasing. To solve the above dilemma, we propose a reversible and invertible recurrent (RAIR) network in this paper: Firstly, we construct a reversible dual-path architecture to avoid information loss for two arbitrary functions, F and G. That is to say, no matter what kinds of F and G are and no matter how small the model is, feature maps go through the network without any information loss. Secondly, we adopt an invertible 1x1 convolution to improve channel information remixing. Lastly, we employ a dual-path recurrences (DPR) block that operates in the frequency and the time dimensions separately for the F function and a 3x3 convolution for the G function in the above reversible architecture, which reduces parameter numbers dramatically. Although the model is tiny, experiments on Voice Bank + DEMAND show that our reversible and invertible recurrent architecture improves all the performance metrics: COVL from 3.57 to 3.78, wideband PESQ from 2.94 to 3.15, and STOI from 0.947 to 0.951. The proposed model achieves state-of-the-art results with only 190K parameters. To the best of our knowledge, it is the state-of-the-art model with the smallest size.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要