Efficient Generation of high-order context-dependent Weighted Finite State Transducers for Speech Recognition.

ICASSP(2005)

引用 18|浏览24
暂无评分
摘要
This paper describes an algorithm for efficient building of Weighted Finite State Transducers for speech recognition when high-order context-dependent models of order K > 3 (triphones) with tied states are used. After discussing some inefficiencies of the standard compilation method which make the use of high-order context-dependent models cumbersome and sometimes even impossible because of memory constraints, we show how an algorithm to build a part of the needed composed transducers directly from the decision trees in combination with an improved compilation process can lead to much faster, simpler and more memory-efficient compilation. In our case it also resulted in substantially smaller final networks. With the described algorithm it is simple to use high-order full cross-word models with little overhead directly within a one-pass time-synchronous search, which we test comparing resulting final network sizes, recognition rates and speed on a large, spontaneous Japanese speech database. Using the proposed algorithm it is possible to do real-time recognition using full crossword quinphones with a large acoustic model in about 125MB of memory at about 9% search error.
更多
查看译文
关键词
decision trees,finite state machines,speech recognition,125 MB,context-dependent acoustic models,full cross-word quinphones,high-order context dependent weighted finite state transducers,high-order full cross-word models,one-pass time-synchronous search,phonetic decision trees,real-time recognition,recognition network size,recognition rate,speech recognition,tied-state triphone models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要