ATOSE: Audio Tagging with One-Sided Joint Embedding

Jaehwan Lee, Daekyeong Moon,Jik-Soo Kim,Minkyoung Cho

APPLIED SCIENCES-BASEL(2023)

引用 0|浏览4
暂无评分
摘要
Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly using raw audio data as input for deep learning models in order to perform tagging and eliminate the need for preprocessing. Unfortunately, most current studies of audio auto-tagging cannot effectively reflect the semantic relationships between tags-for instance, the connection between "classical music" and "cello". In this paper, we propose a novel method that can enhance audio auto-tagging performance via joint embedding. Our model has been carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations.
更多
查看译文
关键词
deep learning,music auto-tagging,joint embedding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要