KAN-AV dataset for audio-visual face and speech analysis in the wild

IMAGE AND VISION COMPUTING(2023)

引用 0|浏览13
暂无评分
摘要
Human-computer interaction is becoming increasingly prevalent in daily life with the adoption of intelligent devices. These devices must be capable of interacting in diverse settings, such as environments with noise, music and differing illumination and occlusion conditions. They must also interact with a variety of end users across ages and backgrounds. Therefore, the machine learning community needs in-the-wild multi-modal datasets to develop models for face and speech analysis so that they can be applicable in most real world scenarios. However, most existing audio and audio-visual databases are captured in controlled conditions with few or no age and kinship labels. In this paper, we introduce the KAN-AV dataset which contains 98 h of audio-visual data from 970 identities across ages. Two thirds of the identities have kin relations in the dataset. The dataset is manually annotated with labels for kinship, age, and gender and is intended to drive future research in face and speech analysis.
更多
查看译文
关键词
KAN-AV,Speaker verification,Kinship verification,Age-invariant,Cross-modal matching,Audio-visual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要