BUTTER: A Representation Learning Framework for Bi-directional Music-Sentence Retrieval and Generation

Yixiao Zhang,Ziyu Wang,Dingsu Wang,Gus Xia

NLP4MUSA（2020）

引用 0|浏览5

暂无评分

摘要

We propose BUTTER, a unified multimodal representation learning model for Bidirectional mUsic-senTence ReTrieval and GenERation. Based on the variational autoencoder framework, our model learns three interrelated latent representations: 1) a latent music representation, which can be used to reconstruct a short piece, 2) keyword embedding of music descriptions, which can be used for caption generation, and 3) a crossmodal representation, which is disentangled into several different attributes of music by aligning the latent music representation and keyword embeddings. By mapping between different latent representations, our model can search/generate music given an input text description, and vice versa. Moreover, the model enables controlled music transfer by partially changing the keywords of corresponding descriptions.1

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要