Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers
CoRR(2024)
摘要
Recently, a mask-based beamformer with attention-based spatial covariance
matrix aggregator (ASA) was proposed, which was demonstrated to track moving
sources accurately. However, the deep neural network model used in this
algorithm is limited to a specific channel configuration, requiring a different
model in case a different channel permutation, channel count, or microphone
array geometry is considered. Addressing this limitation, in this paper, we
investigate three approaches to improve the robustness of the ASA-based
tracking method against such variations: incorporating random channel
configurations during the training process, employing the
transform-average-concatenate (TAC) method to process multi-channel input
features (allowing for any channel count and enabling permutation invariance),
and utilizing input features that are robust against variations of the channel
configuration. Our experiments, conducted using the CHiME-3 and DEMAND
datasets, demonstrate improved robustness against mismatches in channel
permutations, channel counts, and microphone array geometries compared to the
conventional ASA-based tracking method without compromising performance in
matched conditions, suggesting that the mask-based beamformer with ASA
integrating the proposed approaches has the potential to track moving sources
for arbitrary microphone arrays.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要