A Speech Quality Evaluation Method Based on Auditory Characteristic

international conference on control and automation(2016)

引用 0|浏览3
暂无评分
摘要
In this paper, an objective measurement for speech quality evaluation based on auditory characteristics is proposed. Firstly, the features are extracted by the Gammatone auditory filter bank to calculate the Gammatone Frequency Cepstrum Coefficient (GFCC) of the original speech and the distortion. After getting the GFCC, the average distortion distance is obtained. Then, in order to get an objective estimator for the objective Mean Opinion Score (MOS), it is necessary to map the normalized average distortion distance to the MOS scale. Experiments show that the proposed algorithm can greatly reduce the algorithm complexity. At the same time, the relevancy between the subjective MOS and the objective MOS is close to the Perceptual Evaluation of Speech Quality (PESQ). It is proved that the new algorithm is useful for the real-time online monitoring of the speech quality. In this paper, an objective speech quality evaluation method based on auditory characteristics is presented. The Gammatone filter bank that accords with human auditory system is used to extract characteristic parameters. And then GFCC of original voice and distorted voice are calculated to find out the average distortion distance, and the mapping relationship between subjective MOS and normalized average distortion distance is established. Finally, the objective MOS value can be obtained from this mapping relationship and the performance of the algorithm can be compared with PESQ. II. FEATURE EXTRACTION The study of human ear physiology shows that human auditory system is composed of the external ear, middle ear and inner ear. Speech signals pass though the external ear, middle ear and inner ear in turn and go into the auditory central system after the decomposition of cochlear basilar membrane. Cochlear is the key component of the whole system. When the speech signal is introduced into cochlear basilar membrane, the basilar membrane will generate vibration in the form of traveling wave, and the acoustic response of the basilar membrane is related to the frequency of speech signal. The frequency decomposition of the basilar membrane is an important part of the speech signal processing in the human auditory system (3). In this paper, we use the Gammatone filter bank to simulate the cochlear model and extract the characteristic parameters that corresponds to subjective perception of the human ear.
更多
查看译文
关键词
speech quality,objective evaluation,GFCC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要