Dynamic speaker localization based on a novel lightweight R–CNN model

Neural Comput. Appl.(2023)

引用 0|浏览7
暂无评分
摘要
In this study, a novel sound localization approach is proposed that provides 3D coordinates of the real moving speaker. Sound recordings of a real user indoor environment were used for the proposed study. Four conventional microphones simultaneously recorded speech signals as the user moved between 14 predetermined locations. For extracting environment noise from recorded sound signals and accurately determining the origin of speech, z -score-based peak detection approach is used. The delays between acquired speech signals are calculated with the generalized cross-correlation phase transform approach. The determined delays are transformed into a special distance matrix, and each of these matrices is assigned to a particular speaker location in 3D space. A novel lightweight convolutional neural network-based deep regression network structure was constructed in order to learn the relationship between these distance matrices and real 3D location information. As a result, the sound localization problem has been transformed from an iterative solution to an innovative regression problem structure. With the low-cost traditional microphone structures and hardware used in this approach, the position of moving speaker is determined with high accuracy compared to the particle swarm optimization-based time difference of arrival approach. According to the performance comparison, the average localization deviation of 45.826 cm obtained in the time difference of arrival-based sound source localization approach was reduced to 16.298 cm in the proposed approach.
更多
查看译文
关键词
Sound source localization,Deep regression network,R-CNN,GCC-PHAT,TDOA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要