RNN-BLSTM Based Multi-Pitch Estimation

Jianshu Zhang,Jian Tang,Li-Rang Dai

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES（2016）

引用 19|浏览17

暂无评分

摘要

Multi-pitch estimation is critical in many applications, including computational auditory scene analysis (CASA), speech enhancement/separation and mixed speech analysis; however, despite much effort, it remains a challenging problem. This paper uses the PEFAC algorithm to extract features and proposes the use of recurrent neural networks with bidirectional Long Short Term Memory (RNN-BLSTM) to model the two pitch contours of a mixture of two speech signals. Compared with feedforward deep neural networks (DNN), which are trained on static frame level acoustic features, RNN-BLSTM is trained on sequential frame-level features and has more power to learn pitch contour temporal dynamics. The results of evaluations using a speech dataset containing mixtures of two speech signals demonstrate that RNN-BLSTM can substantially outperform DNN in multi pitch estimation of mixed speech signals.

查看译文

关键词

multi-pitch estimation, neural networks, RNN-BLSTM, PEFAC

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要