An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech.

Yingming Gao,Xinyu Zhang,Yi Xu,Jinsong Zhang,Peter Birkholz

INTERSPEECH（2020）

引用 2|浏览10

暂无评分

摘要

The complex f(0) variations in continuous speech make it rather difficult to perform automatic recognition of tones in a language like Mandarin Chinese. In this study, we tested the use of target approximation model (TAM) for continuous tone recognition on two datasets. TAM simulates f(0) production from the articulatory point of view and so allow to discover the underlying pitch targets from the surface f(0) contour. The f(0) contour of each tone represented by 30 equidistant points in the first dataset was simulated by the TAM model. Using a support vector machine (SVM) to classify tones showed that, compared to the representation by 30 f(0) values, the estimated three-dimensional TAM parameters had a comparable performance in characterizing tone patterns. The TAM model was further tested on the second dataset containing more complex tonal variations. With equal or a fewer number of features, the TAM parameters provided better performance than the coefficients of the cosine transform and a slightly worse performance than the statistical f(0) parameters for tone recognition. Furthermore, we investigated bidirectional LSTM neural network for modelling the sequential tonal variations, which proved to be more powerful than the SVM classifier. The BLSTM system incorporating TAM and statistical f(0) parameters achieved the best accuracy of 87.56%.

查看译文

关键词

continuous Mandarin speech, tone modeling and recognition, target approximation model, LSTM neural network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要