CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
CoRR(2024)
摘要
Singing voice beautifying is a novel task that has application value in
people's daily life, aiming to correct the pitch of the singing voice and
improve the expressiveness without changing the original timbre and content.
Existing methods rely on paired data or only concentrate on the correction of
pitch. However, professional songs and amateur songs from the same person are
hard to obtain, and singing voice beautifying doesn't only contain pitch
correction but other aspects like emotion and rhythm. Since we propose a fast
and high-fidelity singing voice beautifying system called ConTuner, a diffusion
model combined with the modified condition to generate the beautified
Mel-spectrogram, where the modified condition is composed of optimized pitch
and expressiveness. For pitch correction, we establish a mapping relationship
from MIDI, spectrum envelope to pitch. To make amateur singing more expressive,
we propose the expressiveness enhancer in the latent space to convert amateur
vocal tone to professional. ConTuner achieves a satisfactory beautification
effect on both Mandarin and English songs. Ablation study demonstrates that the
expressiveness enhancer and generator-based accelerate method in ConTuner are
effective.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要