Phoneme Based Domain Prediction For Language Model Adaptation

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2020)

引用 0|浏览10
暂无评分
摘要
Automatic Speech Recognizer (ASR) and Natural Language Understanding (NLU) are the two key components for any voice assistant. ASR converts the input audio signal to text using acoustic model (AM), language model (LM) and Decoder. NLU further processes this text for sub-tasks like predicting domain, intent and slots. Since input to NLU is text, any error in ASR module will propagate in NLU sub-tasks. ASR generally process speech in small duration windows and first generates phonemes using Acoustic Model (AM) and then Word Lattices using Decoder, Dictionary and Language Model (LM). Training and maintaining a generic LM, which fits the distribution of data of multiple domains is a difficult task. So our proposed architecture uses multiple domain specific LMs to rescore word lattice and has a way to select LMs for rescoring. In this paper, we are proposing a novel Multistage CNN architecture to classify the domain from partial phoneme sequence and use it to select top K domain LMs. The accuracy of multistage classification model based on phoneme input for top three domains has achieved state-of-the-art results on 2 open datasets, 97.76% in ATIS and 99.57% in Snips.
更多
查看译文
关键词
Language Adaptation, Phoneme Classification, Multistage CNN, Domain specific LM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要