Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
Previous methods for predicting room acoustic parameters and speech quality
metrics have focused on the single-channel case, where room acoustics and Mean
Opinion Score (MOS) are predicted for a single recording device. However,
quality-based device selection for rooms with multiple recording devices may
benefit from a multi-channel approach where the descriptive metrics are
predicted for multiple devices in parallel. Following our hypothesis that a
model may benefit from multi-channel training, we develop a multi-channel model
for joint MOS and room acoustics prediction (MOSRA) for five channels in
parallel. The lack of multi-channel audio data with ground truth labels
necessitated the creation of simulated data using an acoustic simulator with
room acoustic labels extracted from the generated impulse responses and labels
for MOS generated in a student-teacher setup using a wav2vec2-based MOS
prediction model. Our experiments show that the multi-channel model improves
the prediction of the direct-to-reverberation ratio, clarity, and speech
transmission index over the single-channel model with roughly 5× less
computation while suffering minimal losses in the performance of the other
metrics.
更多查看译文
关键词
Speech quality assessment,joint learning,room acoustics,neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要