Conditional Conformer: Improving Speaker Modulation For Single And Multi-User Speech Enhancement

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览42
暂无评分
摘要
Recently, Feature-wise Linear Modulation (FiLM) has been shown to outperform other approaches to incorporate speaker embedding into speech separation and VoiceFilter models. We propose an improved method of incorporating such embeddings into a Voice- Filter frontend for automatic speech recognition (ASR) and text- independent speaker verification (TI-SV). We extend the widely- used Conformer architecture to construct a FiLM Block with additional feature processing before and after the FiLM layers. Apart from its application to single-user VoiceFilter, we show that our system can be easily extended to multi-user VoiceFilter models via element-wise max pooling of the speaker embeddings in a projected space. The final architecture, which we call Conditional Conformer, tightly integrates the speaker embeddings into a Conformer backbone. We improve TI-SV equal error rates by as much as 56% over prior multi-user VoiceFilter models, and our element-wise max pooling reduces relative WER compared to an attention mechanism by as much as 10%.
更多
查看译文
关键词
Noise robust ASR,Speaker embedding,Voice-Filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要