Neural Oracle Search On N-Best Hypotheses

Ehsan Variani,Tongzhou Chen,James Apfel,Bhuvana Ramabhadran,Seungji Lee,Pedro Moreno

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING（2020）

引用 13|浏览135

暂无评分

摘要

In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed algorithm is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This algorithm achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. We also propose the use of a binary classifier gating function that can learn to trigger the 2nd-pass neural search model when the 1-best hypothesis is not the oracle hypothesis, thereby avoiding extra computation.

查看译文

关键词

Speech recognition, Encoder-decoder, N-best rescoring

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要