Target Selection Strategies for Demucs-Based Speech Enhancement

APPLIED SCIENCES-BASEL(2023)

引用 0|浏览2
暂无评分
摘要
Featured Application Online audio applications such as real-time automatic speech recognition, sound source localization in service robotics, hearing prosthesis, mobile telecommunication, and teleconferencing. The Demucs-Denoiser model has been recently shown to achieve a high level of performance for online speech enhancement, but assumes that only one speech source is present in the fed mixture. In real-life multiple-speech-source scenarios, it is not certain which speech source will be enhanced. To correct this issue, two target selection strategies for the Demucs-Denoiser model are proposed and evaluated: (1) an embedding-based strategy, using a codified sample of the target speech, and (2) a location-based strategy, using a beamforming-based prefilter to select the target that is in front of a two-microphone array. In this work, it is shown that while both strategies improve the performance of the Demucs-Denoiser model when one or more speech interferences are present, they both have their pros and cons. Specifically, the beamforming-based strategy achieves overall a better performance (increasing the output SIR between 5 and 10 dB) compared to the embedding-based strategy (which only increases the output SIR by 2 dB and only in low-input-SIR scenarios). However, the beamforming-based strategy is sensitive against the location variation of the target speech source (decreasing the output SIR by 10 dB if the target speech source is located only 0.1 m from its expected position), which the embedding-based strategy does not suffers from.
更多
查看译文
关键词
Demucs,target selection,embeddings,phase-based beamforming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要