JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
CoRR(2023)
摘要
With rapid advances in generative artificial intelligence, the text-to-music
synthesis task has emerged as a promising direction for music generation from
scratch. However, finer-grained control over multi-track generation remains an
open challenge. Existing models exhibit strong raw generation capability but
lack the flexibility to compose separate tracks and combine them in a
controllable manner, differing from typical workflows of human composers. To
address this issue, we propose JEN-1 Composer, a unified framework to
efficiently model marginal, conditional, and joint distributions over
multi-track music via a single model. JEN-1 Composer framework exhibits the
capacity to seamlessly incorporate any diffusion-based music generation system,
\textit{e.g.} Jen-1, enhancing its capacity for versatile multi-track music
generation. We introduce a curriculum training strategy aimed at incrementally
instructing the model in the transition from single-track generation to the
flexible generation of multi-track combinations. During the inference, users
have the ability to iteratively produce and choose music tracks that meet their
preferences, subsequently creating an entire musical composition incrementally
following the proposed Human-AI co-composition workflow. Quantitative and
qualitative assessments demonstrate state-of-the-art performance in
controllable and high-fidelity multi-track music synthesis. The proposed JEN-1
Composer represents a significant advance toward interactive AI-facilitated
music creation and composition. Demos will be available at
https://jenmusic.ai/audio-demos.
更多查看译文
关键词
music
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要