Improving the Usage of Subword-Based Units for Turkish Speech Recognition

2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU)(2020)

引用 1|浏览14
暂无评分
摘要
Subword units are often utilized to achieve better performance in speech recognition because of the high number of observed words in agglutinative languages. In this study, the proper use of subword units is explored in recognition by a reconsideration of details such as silence modeling and positiondependent phones. A modified lexicon by finite-state transducers is implemented to represent the subword units correctly. Also, we experiment with different types of word boundary markers and achieve the best performance by adding a marker both to the left and right side of a subword unit. In our experiments on a Turkish broadcast news dataset, the subword models do outperform wordbased models and naive subword implementations. Results show that using proper subword units leads to a relative word error rate (WER) reductions, which is 2.4%, compared with the word level automatic speech recognition (ASR) system for Turkish.
更多
查看译文
关键词
speech recognition, language modelling, acoustic modelling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要