Toward optimizing stream fusion in multistream recognition of speech.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA(2011)
摘要
A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the best fusion architecture is chosen automatically to maximize the similarity of output statistics to clean condition. Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method. (C) 2011 Acoustical Society of America
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络