Learning Filterbanks from Raw Speech for Phone Recognition
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2018)
摘要
We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.
更多查看译文
关键词
phone recognition experiments,front-end steps,raw speech,complex filters,raw waveform,convolutional neural network,end-to-end phone recognition,time-domain filter banks,mel-filter bank approximation,TIMIT,asymmetric impulse response,TD-filter banks,convolutional architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要