Evaluation Metrics for Generative Speech Enhancement Methods: Issues and Perspectives

Jan Pirklbauer, Marvin Sach, Kristoff Fluyt,Wouter Tirry, Wafaa Wardah,Sebastian Moeller,Tim Fingscheidt

Speech Communication; 15th ITG Conference(2023)

引用 0|浏览1
暂无评分
摘要
Generative speech enhancement methods commonly employ components of text-to-speech (TTS) systems to suppress noise and enhance speech quality. They have won traction recently, as they allow for a clean, virtually noisefree speech estimate. However, they come with unique error types such as mumbled speech and substituted phonemes, which are often not recognized by common nonintrusive speech quality metrics such as NISQA and DNSMOS. Intrusive metrics, such as PESQ and STOI on the other hand, are also not reliable due to their dependence on audio similarity and therefore rarely adopted in TTS research. In this work, we provide insights into typical issues of instrumental evaluation of generative approaches to speech enhancement. Furthermore, we propose the Levenshtein phoneme distance (LPD) that helps to catch and interpret the unique error types evoked by generative approaches. Finally, we propose best practices for interpreting metrics for generative approaches, pointing out that PESQ is indeed useful for the evaluation of generative speech enhancement in low-SNR conditions, while NISQA and DNSMOS are good in mid to high SNR.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要