Voice-Face Cross-Modal Association Learning Based on Deep Residual Shrinkage Network

Junyu Li, Lin Tan,Yuchen Zhou,Jingyi Mao,Ziqing Liu, Fanliang Bu

2023 IEEE International Conference on Image Processing and Computer Applications (ICIPCA)（2023）

引用 0|浏览1

暂无评分

摘要

Establishing associations between voices and faces has grown in popularity in recent years, but current voice-face cross-modal association methods face challenges such as limited feature extraction capability and insufficient semantic associations. To address the aforementioned issues, we propose a voice-face cross-modal association learning method based on a deep residual shrinkage network. First, a deep residual shrinkage block is added to the dual-stream residual network to improve network training efficiency and acquire more discriminative embedded features. Then the multi-similarity loss function is used in metric learning to tap into the connections between voice and face modalities and enhance the network's robustness and generalization ability. In voice-face cross-modal verification, cross-modal matching, and cross-modal retrieval tasks, experimental findings indicate that our method improves accuracy by about 2% over existing baseline methods.

查看译文

关键词

cross-modal association learning,deep residual shrinkage network,multi-similarity loss,cross-modal matching

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要