A Multistream Multiresolution Framework For Phoneme Recognition

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2(2010)

引用 45|浏览32
暂无评分
摘要
Spectrotemporal representation of speech has already shown promising results in speech processing technologies, however, many inherent issues of such representation, such as high dimensionality have limited their use in speech and speaker recognition. Multistream framework fits very well to such representation where different regions can be separately mapped into posterior probabilities of classes before merging. In this study, we investigated the effective ways of forming streams out of this representation for robust phoneme recognition. We also investigated multiple ways of fusing the posteriors of different streams based on their individual confidence or interactions between them. We observed 8.6% relative improvement in clean and 4% in noise. We developed a simple yet effective linear combination technique that provides intuitive understanding of stream combinations and how even systematic errors can be learned to reduce confusions.
更多
查看译文
关键词
speech recognition,spectrotemporal modulations,multistream
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要