Time-Frequency Kernel-Based Cnn For Speech Recognition

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 27|浏览34
暂无评分
摘要
We propose a novel approach to generate time-frequency kernel based deep convolutional neural networks (CNN) for robust speech recognition. We give different treatments to shifting along the time and the frequency axes of speech feature representations in the 2D convolution, so as to achieve certain invariance in small frequency shifts while expanding time context size for speech input without smearing time positions of phone segments. The 20-kernel approach allows easy implementation of deep CNNs. We present experimental results on speaker-independent phone recognition tasks of TIMIT and FFMTIMIT. where the latter was acquired using a far-field microphone and the speech data are noisy. Our results demonstrate that the proposed time-frequency kernel-based CNN gives consistent phone error reductions over frequency-domain CNN and DNN for both TIMIT and FFMTIMIT, with more benefits shown for recognizing noisy speech by using clean speech models.
更多
查看译文
关键词
time-frequency kernels, convolutional neural network, robust speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要