Identity, Gender, Age, and Emotion Recognition from Speaker Voice with Multi-task Deep Networks for Cognitive Robotics

Cognitive Computation(2024)

引用 0|浏览1
暂无评分
摘要
This paper presents a study on the use of multi-task neural networks (MTNs) for voice-based soft biometrics recognition, e.g., gender, age, and emotion, in social robots. MTNs enable efficient analysis of audio signals for various tasks on low-power embedded devices, thus eliminating the need for cloud-based solutions that introduce network latency. However, the strict dataset requirements for training limit the potential of MTNs, which are commonly used to optimize a single reference problem. In this paper, we propose three MTN architectures with varying accuracy-complexity trade-offs for voice-based soft biometrics recognition. In addition, we adopt a learnable voice representation, that allows to adapt the specific cognitive robotics application to the environmental conditions. We evaluate the performance of these models on standard large-scale benchmarks, and our results show that the proposed architectures outperform baseline models for most individual tasks. Furthermore, one of our proposed models achieves state-of-the-art performance on three out of four of the considered benchmarks. The experimental results demonstrate that the proposed MTNs have the potential for being part of effective and efficient voice-based soft biometrics recognition in social robots.
更多
查看译文
关键词
Deep learning,Multi-task learning,Deep audio representation learning,Cognitive robotics,Soft biometrics,Voice analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要