On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations
CoRR(2024)
摘要
Audio embeddings are crucial tools in understanding large catalogs of music.
Typically embeddings are evaluated on the basis of the performance they provide
in a wide range of downstream tasks, however few studies have investigated the
local properties of the embedding spaces themselves which are important in
nearest neighbor algorithms, commonly used in music search and recommendation.
In this work we show that when learning audio representations on music datasets
via contrastive learning, musical properties that are typically homogeneous
within a track (e.g., key and tempo) are reflected in the locality of
neighborhoods in the resulting embedding space. By applying appropriate data
augmentation strategies, localisation of such properties can not only be
reduced but the localisation of other attributes is increased. For example,
locality of features such as pitch and tempo that are less relevant to
non-expert listeners, may be mitigated while improving the locality of more
salient features such as genre and mood, achieving state-of-the-art performance
in nearest neighbor retrieval accuracy. Similarly, we show that the optimal
selection of data augmentation strategies for contrastive learning of music
audio embeddings is dependent on the downstream task, highlighting this as an
important embedding design decision.
更多查看译文
关键词
Music audio embeddings,data augmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要