Multiple time-instances features based approach for reference-free speech quality measurement

COMPUTER SPEECH AND LANGUAGE(2023)

引用 0|浏览4
暂无评分
摘要
This paper investigates the problem of measuring speech quality of received speech signal without employing the original speech signal. The problem of deterioration of the speech quality arises due to noise present in the surroundings. To this line, we propose a multiple time -instances (MTI) features-based approach for reference-free speech quality measurement model. A voice activity detector (VAD) is exploited primarily for calculating the number of active speech chunks of a speech signal. For these chunks and their successive combinations called here batches, multi-resolution auditory model (MRAM), mel-frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) features are extracted and called as MTI features. It is hypothesized that the MTI features are capable in capturing the distortions caused by time-localized effects of short-time transients, impulsive noise, and its differences from the plosive sounds. The MTI metric estimates (MTI-ME) are calculated corresponding to these MTI features employing the Gaussian mixture model (GMM) probabilistic technique. The overall objective speech quality of a speech signal is then determined as a linear combination of optimally weighted MTI-ME corresponding to distinct active speech chunks and their successive combinations, that is, batches of that speech signal. Minimum mean square error criterion or Pearson's correlation maximization criterion is employed for computing optimal weights. In addition, a deep neural network (DNN)-based speech quality model is also developed for calculating a single objective speech quality while considering all active speech chunks together. Pearson's correlation coefficient and weighted average correlation are exploited for evaluating the performance. Results demonstrate that the proposed model achieves promising improvement over the standard speech quality model (P.563) and improves correlation values by around 37%.
更多
查看译文
关键词
Processed speech, Voice activity detector, Multi-resolution auditory model, Multiple time-instances features, Deep neural network, Reference-free speech quality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要